Statistical Inference

Similar documents
MATH 320: Probability and Statistics 9. Estimation and Testing of Parameters. Readings: Pruim, Chapter 4

Random Variables, Sampling and Estimation

1 Inferential Methods for Correlation and Regression Analysis

Statistics 511 Additional Materials

DS 100: Principles and Techniques of Data Science Date: April 13, Discussion #10

6. Sufficient, Complete, and Ancillary Statistics

Properties and Hypothesis Testing

Statistical Inference (Chapter 10) Statistical inference = learn about a population based on the information provided by a sample.

6.3 Testing Series With Positive Terms

Discrete Mathematics for CS Spring 2008 David Wagner Note 22

Stat 421-SP2012 Interval Estimation Section

CEE 522 Autumn Uncertainty Concepts for Geotechnical Engineering

FACULTY OF MATHEMATICAL STUDIES MATHEMATICS FOR PART I ENGINEERING. Lectures

Chapter 3. Strong convergence. 3.1 Definition of almost sure convergence

Chapter 6 Principles of Data Reduction

Chapter 6 Sampling Distributions

Topic 9: Sampling Distributions of Estimators

5. Likelihood Ratio Tests

Resampling Methods. X (1/2), i.e., Pr (X i m) = 1/2. We order the data: X (1) X (2) X (n). Define the sample median: ( n.

NANYANG TECHNOLOGICAL UNIVERSITY SYLLABUS FOR ENTRANCE EXAMINATION FOR INTERNATIONAL STUDENTS AO-LEVEL MATHEMATICS

This is an introductory course in Analysis of Variance and Design of Experiments.

MOST PEOPLE WOULD RATHER LIVE WITH A PROBLEM THEY CAN'T SOLVE, THAN ACCEPT A SOLUTION THEY CAN'T UNDERSTAND.

EXAMINATIONS OF THE ROYAL STATISTICAL SOCIETY

7.1 Convergence of sequences of random variables

Topic 9: Sampling Distributions of Estimators

Read through these prior to coming to the test and follow them when you take your test.

Direction: This test is worth 250 points. You are required to complete this test within 50 minutes.

7.1 Convergence of sequences of random variables

Kinetics of Complex Reactions

Chapter 22. Comparing Two Proportions. Copyright 2010, 2007, 2004 Pearson Education, Inc.

Distribution of Random Samples & Limit theorems

Final Examination Solutions 17/6/2010

Topic 9: Sampling Distributions of Estimators

t distribution [34] : used to test a mean against an hypothesized value (H 0 : µ = µ 0 ) or the difference

Computing Confidence Intervals for Sample Data

STA Learning Objectives. Population Proportions. Module 10 Comparing Two Proportions. Upon completing this module, you should be able to:

Chapter 22. Comparing Two Proportions. Copyright 2010 Pearson Education, Inc.

7-1. Chapter 4. Part I. Sampling Distributions and Confidence Intervals

Estimation for Complete Data

6 Sample Size Calculations

Econ 325 Notes on Point Estimator and Confidence Interval 1 By Hiro Kasahara

MATH 472 / SPRING 2013 ASSIGNMENT 2: DUE FEBRUARY 4 FINALIZED

CS284A: Representations and Algorithms in Molecular Biology

Sample Size Estimation in the Proportional Hazards Model for K-sample or Regression Settings Scott S. Emerson, M.D., Ph.D.

April 18, 2017 CONFIDENCE INTERVALS AND HYPOTHESIS TESTING, UNDERGRADUATE MATH 526 STYLE

Statistical inference: example 1. Inferential Statistics

Sequences. Notation. Convergence of a Sequence

If, for instance, we were required to test whether the population mean μ could be equal to a certain value μ

Lecture 2: Monte Carlo Simulation


Problem Set 4 Due Oct, 12

4. Partial Sums and the Central Limit Theorem

Because it tests for differences between multiple pairs of means in one test, it is called an omnibus test.

Recall the study where we estimated the difference between mean systolic blood pressure levels of users of oral contraceptives and non-users, x - y.

Lecture 6 Simple alternatives and the Neyman-Pearson lemma

An Introduction to Randomized Algorithms

Chapter 8: Estimating with Confidence

Convergence of random variables. (telegram style notes) P.J.C. Spreij

The standard deviation of the mean

Since X n /n P p, we know that X n (n. Xn (n X n ) Using the asymptotic result above to obtain an approximation for fixed n, we obtain

Tests of Hypotheses Based on a Single Sample (Devore Chapter Eight)

Math 113 Exam 3 Practice

Analysis of Experimental Data

Overview. p 2. Chapter 9. Pooled Estimate of. q = 1 p. Notation for Two Proportions. Inferences about Two Proportions. Assumptions

Bayesian Methods: Introduction to Multi-parameter Models

Goodness-of-Fit Tests and Categorical Data Analysis (Devore Chapter Fourteen)

10-701/ Machine Learning Mid-term Exam Solution

GG313 GEOLOGICAL DATA ANALYSIS

A statistical method to determine sample size to estimate characteristic value of soil parameters

Infinite Sequences and Series

Agreement of CI and HT. Lecture 13 - Tests of Proportions. Example - Waiting Times

MBACATÓLICA. Quantitative Methods. Faculdade de Ciências Económicas e Empresariais UNIVERSIDADE CATÓLICA PORTUGUESA 9. SAMPLING DISTRIBUTIONS

Describing the Relation between Two Variables

GUIDELINES ON REPRESENTATIVE SAMPLING

Chapter 23: Inferences About Means

IIT JAM Mathematical Statistics (MS) 2006 SECTION A

Continuous Data that can take on any real number (time/length) based on sample data. Categorical data can only be named or categorised

Economics 241B Relation to Method of Moments and Maximum Likelihood OLSE as a Maximum Likelihood Estimator

Lecture 19: Convergence

Stat 319 Theory of Statistics (2) Exercises

Advanced Stochastic Processes.

Introductory statistics

MAT1026 Calculus II Basic Convergence Tests for Series

EECS564 Estimation, Filtering, and Detection Hwk 2 Solns. Winter p θ (z) = (2θz + 1 θ), 0 z 1

Lecture 1 Probability and Statistics

CHAPTER 10 INFINITE SEQUENCES AND SERIES

SOME THEORY AND PRACTICE OF STATISTICS by Howard G. Tucker

Data Analysis and Statistical Methods Statistics 651

January 25, 2017 INTRODUCTION TO MATHEMATICAL STATISTICS

Chapter 10: Power Series

AP Statistics Review Ch. 8

Econ 325/327 Notes on Sample Mean, Sample Proportion, Central Limit Theorem, Chi-square Distribution, Student s t distribution 1.

Let us give one more example of MLE. Example 3. The uniform distribution U[0, θ] on the interval [0, θ] has p.d.f.

MATH/STAT 352: Lecture 15

Sequences and Series of Functions

Singular Continuous Measures by Michael Pejic 5/14/10

Confidence intervals summary Conservative and approximate confidence intervals for a binomial p Examples. MATH1005 Statistics. Lecture 24. M.

STAC51: Categorical data Analysis

Direction: This test is worth 150 points. You are required to complete this test within 55 minutes.

Lecture 5: Parametric Hypothesis Testing: Comparing Means. GENOME 560, Spring 2016 Doug Fowler, GS

Transcription:

Solved Exercises ad Problems of Statistical Iferece David Casado Complutese Uiversity of Madrid Faculty of Ecoomic ad Busiess Scieces Departmet of Statistics ad Operatioal Research II David Casado de Lucas 5 Jue 05 ou ca decide ot to prit this file ad cosult it i digital format paper ad ik will be saved. Otherwise, prit it o recycled paper, double-sided ad with less ik. Be ecological. Thak you very much.

Cotets Liks, Keywords ad Descriptios Iferece Theory IT Framework ad Scope of the Methods Some Remarks Samplig Probability Distributio Poit Estimatios PE Methods for Estimatig Properties of Estimators Methods ad Properties Cofidece Itervals CI Methods for Estimatig Miimum Sample Size Methods ad Sample Size Hypothesis Tests HT Parametric Based o T Based o Λ Aalysis of Variace ANOVA Noparametric Parametric ad Noparametric PE CI HT Additioal Exercises Appedixes Probability Theory Some Remiders Markov's Iequality. Chebyshev's Iequality Probability ad Momets Geeratig Fuctios. Characteristic Fuctio. Mathematics Some Remiders 6 7 7 7 9 9 3 73 3 7 7 6 6 73 7 93 7 8 8 83 83 93 9 9 5 9 7 7 5 6 37 38 3 53 5 69 70 9 70 75 70 70 7 7 7 9 09 9 9

Limits Refereces Tables of Statistics Probability Tables Idex 9 9 0 7 8 3 5 Prologue These exercises ad problems are a ecessary complemet to the theory icluded i Notes of Statistical Iferece, available at http://www.casado-d.org/edu/notesstatisticaliferece-slides.pdf. Nevertheless, some importat theoretical details are also icluded i the remarks at the begiig of each chapter. Those Notes are thought for teachig purposes, ad they do ot iclude the advaced mathematical justificatios ad calculatios icluded i this documet. Although we ca study oly liearly ad step by step, it is worth oticig that methods are usually related as tasks are i the real-world i Statistical Iferece. Thus, i most exercises ad problems we have made it clear which are the suppositios ad how they should be proved properly. I same cases, several statistical methods have bee aturally combied i the statemet. May steps ad eve seteces are repeated i most exercises of the same type, both to isist o them ad to facilitate the readig of exercises idividually. The advaced exercises have bee marked with the symbol *. Writte i Courier New fot style is the code with which we have doe some calculatio by usig the programmig laguage R you ca copy ad paste this code from the file. I iclude some otes to help, up to my kowledge, studets with a mother laguage differet to the Eglish. Ackowledgemets This documet has bee created with Liux, LibreOffice, OpeOffice.Org, GIMP ad R. I thak those who make this software available for free. I doate fuds to these kids of project from time to time.

Liks, Keywords ad Explaatios Iferece Theory IT Framework ad Scope of the Methods > [Keywords] ifiite populatios, idepedet populatios, ormality, asymptoticess, descriptive statistics. > [Descriptio] The coditios uder which the Statistics cosidered here ca be applied are listed. Some Remarks > [Keywords] partial kowledge, radomess, certaity, dimesioal aalysis, validity, use of the samples, calculatios. > [Descriptio] The partial kowledge justifies both the radom character of the mathematical variables used to explai the variables of the real-world problems ad the impossibility of reachig the maximum certaity i usig samples istead of the whole populatio. The validity of the results must be uderstood withi the sceario made of the assumptios, the methods, the certaity ad the data. Samplig Probability Distributio Exercise it-spd > [Keywords] iferece theory, joit distributio, samplig distributio, sample mea, probability fuctio. > [Descriptio] From a simple probability distributio for, the joit distributio of a sample, ad the samplig distributio of the sample mea are determied. Poit Estimatios PE Methods for Estimatig Exercise pe-m > [Keywords] poit estimatios, biomial distributio, Beroulli distributio, method of the momets, maximum likelihood method, plug-i priciple. > [Descriptio] For the biomial distributio, the two methods are applied to estimate the secod parameter probability, whe the first umber of trials is kow. I the secod method, the maximum ca be foud by lookig at the derivatives. Both methods provide the same estimator. The plug-i priciple allows usig the previous estimator to obtai others for the mea ad the variace. Exercise pe-m > [Keywords] poit estimatios, geometric distributio, method of the momets, maximum likelihood method, plug-i priciple. > [Descriptio] For the geometric distributio, the two methods are applied to estimate the parameter. I the secod method, the maximum ca be foud by lookig at the derivatives. Both methods provide the same estimator. The plug-i priciple is applied to use the previous estimator to obtai others for the mea ad the variace. Exercise 3pe-m > [Keywords] poit estimatios, Poisso distributio, method of the momets, maximum likelihood method, plug-i priciple. > [Descriptio] For the Poisso distributio, the two methods are applied to estimate the parameter. I the secod method, the maximum ca be foud by lookig at the derivatives. The two methods provide the same estimator. The plug-i priciple is applied to use the previous estimator to obtai others for the mea ad the variace. Exercise pe-m > [Keywords] poit estimatios, ormal distributio, method of the momets, maximum likelihood method. > [Descriptio] For the ormal distributio, the two methods are applied to estimate at the same time the two parameters of this distributio. I the secod method, the maximum ca be foud by lookig at the derivatives. The two methods provide the same estimator. Exercise 5pe-m > [Keywords] poit estimatios, cotiuous uiform distributio, method of the momets, maximum likelihood method, plug-i priciple, itegrals. > [Descriptio] For the cotiuous uiform distributio, the two methods are applied to estimate the parameter. I the secod method, the maximum caot be foud by lookig at the derivatives ad this task is doe by applyig simple qualitative reasoig. The two methods provide differet estimators. The plug-i priciple allows usig the previous estimator to obtai others for the mea ad the variace. As a mathematical exercise, the theoretical expressio of the mea ad the variace are calculated. Exercise 6pe-m > [Keywords] poit estimatios, traslated expoetial distributio, method of the momets, maximum likelihood method, plug-i priciple, itegrals. > [Descriptio] For a traslatio of the expoetial distributio, the two methods are applied to estimate the parameter. I the secod method, the maximum ca be foud by lookig at the derivatives. The two methods provide the same estimator. The plug-i priciple is applied to use the previous estimator to obtai others for the mea. As a mathematical exercise, the theoretical expressio of the mea ad the variace of the distributio are calculated. Exercise 7pe-m > [Keywords] poit estimatios, method of the momets, maximum likelihood method, plug-i priciple, itegrals. > [Descriptio] For a distributio give through its desity fuctio, the two methods are applied to estimate the parameter. I the secod method, the maximum caot be foud by lookig at the derivatives ad this task is doe by applyig simple qualitative Solved Exercises ad Problems of Statistical Iferece

reasoig. The two methods provide differet estimators. The plug-i priciple is applied to obtai other estimators for the mea ad the variace. Additioally, the theoretical expressio of the mea ad the variace of this distributio are calculated. Properties of Estimators Exercise pe-p > [Keywords] poit estimatios, probability, ormal distributio, sample mea, completio stadardizatio. > [Descriptio] For a ormal distributio with kow parameters, the probability that the sample mea is larger tha a give value is calculated. Exercise pe-p > [Keywords] poit estimatios, probability, ormal distributio, sample quasivariace, completio. > [Descriptio] For a ormal distributio with kow stadard deviatio, the probability that the sample quasivariace is larger tha a give value is calculated. Exercise 3pe-p > [Keywords] poit estimatios, probability, Beroulli distributio, sample proportio, completio stadardizatio, asymptoticess. > [Descriptio] For a Beroulli distributio with kow parameter, the probability that the sample proportio is betwee two give values is calculated. Exercise pe-p > [Keywords] poit estimatios, probability ad quatile, ormal distributio, sample mea, sample quasivariace, completio. > [Descriptio] For two idepedet ormal distributios with kow parameters, probabilities ad quatiles of several evets ivolvig the sample mea or the sample quasivariace are calculated or foud out, respectively. Exercise 5pe-p > [Keywords] poit estimatios, probability, ormal distributio, total sum, completio, boud. > [Descriptio] For two idepedet ormal distributios with kow parameters, the probabilities of several evets ivolvig the total sum are calculated. Exercise 6pe-p > [Keywords] poit estimatios, trimmed sample mea, mea square error, cosistecy, sample mea, rate of covergece. > [Descriptio] To study the populatio mea, the mea square error ad the cosistecy are studied for the trimmed sample mea. The speed i covergig is aalysed through a compariso with that of the ordiary sample mea. Exercise 7pe-p > [Keywords] poit estimatios, chi-square distributio, mea square error, cosistecy. > [Descriptio] To study twice the mea of a chi-square populatio, the mea square error ad the cosistecy are studied for a give estimator. Exercise 8pe-p > [Keywords] poit estimatios, mea square error, relative efficiecy. > [Descriptio] For a sample of size two, the mea square errors of two give estimators are calculated ad compared by usig the relative efficiecy. Exercise 9pe-p > [Keywords] poit estimatios, sample mea, mea square error, cosistecy, efficiecy uder ormality, Cramér-Rao's lower boud. > [Descriptio] That the sample mea is always a cosistet estimator of the populatio mea is proved. Whe the populatio is ormally distributed, this estimator is also efficiet. Exercise 0pe-p > [Keywords] poit estimatios, cotiuous uiform distributio, probability fuctio, sample mea, cosistecy, efficiecy, ubiasedess. > [Descriptio] For a populatio variable followig the cotiuous uiform distributio, the desity fuctio is plotted. The cosistecy ad the efficiecy of the sample mea, as a estimator of the populatio mea, are studied. Lookig at the bias obtaied, a ew ubiased estimator of the populatio mea is built, ad its cosistecy is proved. Exercise pe-p > [Keywords] poit estimatios, geometric distributio, sufficiecy, likelihood fuctio, factorizatio theorem. > [Descriptio] Whe a populatio variable follows the geometric distributio, a miimum-dimesio sufficiet statistic for studyig the parameter is foud by applyig the factorizatio theorem. Exercise pe-p * > [Keywords] poit estimatios, basic estimators, populatio mea, Beroulli distributio, populatio proportio, ormality, populatio variace, mea square error, cosistecy, rate of covergece. > [Descriptio] The mea square error is calculated for all basic estimators of the mea, the proportio for Beroulli populatios ad the variace for ormal populatios. The, their cosistecies i mea of order two ad i probability are studied. For two populatios, the two-variable limits that appear are studied by splittig them ito two oe-variable limits or by bidig them. Exercise 3pe-p * > [Keywords] poit estimatios, basic estimators, ormality, populatio variace, mea square error, cosistecy, rate of covergece. > [Descriptio] For the basic estimators of the variace of ormal populatios, the mea square errors are compared for oe ad two populatios. The computer is used to compare graphically the coefficiets that appear i the expressio of the mea square errors. Besides, the cosistecy is also graphically studied. Exercise pe-p * > [Keywords] poit estimatios, Beroulli distributio, ormal distributio, mea square error, cosistecy, pooled sample proportio, pooled sample variace, rate of covergece. Solved Exercises ad Problems of Statistical Iferece

> [Descriptio] The mea square error is calculated for some pooled estimators of the proportio for Beroulli populatios ad the variace for ormal populatios. The, their cosistecies i mea of order two ad i probability are studied. For pooled estimators, oe sample size tedig to ifiite suffices, that is, oe sample ca do the whole work. Each pooled estimator for the proportio of a Beroulli populatio ad for the variace of a ormal populatio is compared with the atural estimator cosistig i the semisum of the estimators of the two populatios. The computer is also used to compare graphically the coefficiets that appear i the expressio of the mea square errors. The cosistecy ca be studied graphically. Methods ad Properties Exercise pe > [Keywords] poit estimatios, method of the momets, mea square error, cosistecy, maximum likelihood method. > [Descriptio] Give the desity fuctio of a populatio variable, the method of the momets is applied to fid a estimator of the parameter; the mea square error of this estimator is calculated; fially, its cosistecy is studied. O the other had, the maximum likelihood method is applied too; the maximum caot be foud by usig the derivatives ad some qualitative reasoig is ecessary. A simple aalytical calculatio suffices to see how the likelihood fuctio depeds upo the parameter. The two methods provide differet estimators. Exercise pe > [Keywords] poit estimatios, Rayleigh distributio, method of the momets, mea square error, cosistecy, maximum likelihood method. > [Descriptio] Supposed a populatio variable followig the Rayleigh distributio, the method of the momets is applied to build a estimator of the parameter; the mea square error of this estimator is calculated ad its cosistecy is studied. The maximum likelihood method is also applied to build a estimator of the parameter. For this populatio distributio, both methods provide differet estimators. As a mathematical exercise, the expressios of the mea ad the variace are calculated. Exercise 3pe > [Keywords] poit estimatios, expoetial distributio, method of the momets, maximum likelihood method, sufficiecy, likelihood fuctio, factorizatio theorem, sample mea, efficiecy, cosistecy, plug-i priciple. > [Descriptio] A deep statistical study of the expoetial distributio is carried out. To estimate the parameter, two estimators are obtaied by applyig both the method of the momets ad the maximum likelihood method. For this populatio distributio, both methods provide the same estimator. A sufficiet statistic is foud. The sample mea is studied as a estimator of the parameter ad the iverse of the parameter. I this exercise, it is highlighted how importat the mathematical otatio may be i doig calculatios. Cofidece Itervals CI Methods for Estimatig Exercise ci-m > [Keywords] cofidece itervals, method of the pivot, asymptoticess, ormal distributio, margi of error. > [Descriptio] The method of the pivot is applied twice to costruct asymptotic cofidece itervals for the mea ad the stadard deviatio of a ormally distributed populatio variable with ukow mea ad variace. For the first iterval, the expressio of the margi of error is used to obtai the cofidece whe the legth of the iterval is oe uit. Exercise ci-m > [Keywords] cofidece itervals, method of the pivot, asymptoticess, ormal distributio, margi of error. > [Descriptio] The method of the pivot is applied to costruct a asymptotic cofidece iterval for the mea of a populatio variable with ukow variace. There was a previous estimate of the mea that is iside the iterval obtaied. The value of the margi of error is explicitly give. Exercise 3ci-m > [Keywords] cofidece itervals, method of the pivot, Beroulli distributio, asymptoticess. > [Descriptio] The method of the pivot is applied to costruct a asymptotic cofidece iterval for the proportio of a populatio variable followig the Beroulli distributio. Exercise ci-m > [Keywords] cofidece itervals, asymptoticess, method of the pivot, Beroulli distributio, pooled sample proportio. > [Descriptio] A cofidece iterval for the differece betwee two proportios is costructed by applyig the method of the pivot. The iterval allows us to make a decisio about the equality of the proportios, which is equivalet to applyig a two-tailed hypothesis test. As a advaced task, the exercise is repeated with the pooled sample proportio i the deomiator of the statistic estimatio of the variaces of the populatios, ot i the umerator estimatio of the differece betwee the meas. Miimum Sample Size Exercise ci-s > [Keywords] cofidece itervals, miimum sample size, ormal distributio, method of the pivot, margi of error, Chebyshev's iequality. > [Descriptio] To fid the miimum umber of data ecessary to guaratee theoretically the precisio desired, two methods are applied: oe based o the expressio of the margi of error ad the other based o the Chebyshev's iequality. Methods ad Sample Size Exercise ci > [Keywords] cofidece itervals, miimum sample size, ormal distributio, method of the pivot, margi of error, Chebyshev's iequality. > [Descriptio] A cofidece iterval for the mea of a ormal populatio is built by applyig the method of the pivotal quatity. The depedece of the legth of the iterval with the cofidece is aalysed qualitatively. Give all the other quatities, the miimum 3 Solved Exercises ad Problems of Statistical Iferece

sample size is calculated i two differet ways: with the method based o the expressio of the margi of error ad the method based o the Chebyshev's iequality. Exercise ci > [Keywords] cofidece itervals, miimum sample size, asymptoticess, ormal distributio, method of the pivot, margi of error, Chebyshev's iequality. > [Descriptio] A asymptotic cofidece iterval for the mea of a populatio radom variable is costructed by applyig the method of the pivotal quatity. The equivalet exact cofidece iterval ca be obtaied uder the suppositio that the variable is ormally distributed. Give all the other quatities, the miimum sample size is calculated i two differet ways: with the method based o the expressio of the margi of error ad the method based o the Chebyshev's iequality. Exercise 3ci > [Keywords] cofidece itervals, miimum sample size, ormal distributio, method of the pivot, margi of error, Chebyshev's iequality. > [Descriptio] A cofidece iterval for the mea of a ormal populatio is built by applyig the method of the pivotal quatity. Give all the other quatities, the miimum sample size is calculated i two differet ways: with the method based o the expressio of the margi of error ad the method based o the Chebyshev's iequality. The depedece of the legth of the iterval upo the cofidece is aalysed qualitatively. Exercise ci > [Keywords] cofidece itervals, miimum sample size, ormal distributio, method of the pivot, margi of error, Chebyshev's iequality. > [Descriptio] The method of the pivot allows us to costruct a cofidece iterval for the differece betwee the meas of two idepedet ormal populatios. Give the other quatities ad supposig equal sample sizes, the miimum value is calculated by applyig two differet methods: oe based o the expressio of the margi of error ad the other based o the Chebyshev's iequality. Hypothesis Tests HT Parametric Based o T Exercise ht-t > [Keywords] hypothesis tests, ormal distributio, two-tailed test, populatio mea, critical regio, p-value, type I error, type II error, power fuctio. > [Descriptio] A decisio o the equality of the populatio mea of a variable to a give umber is made by applyig a two-sided test ad lookig at both the critical values ad the p-value. The two types of error are determied. With the help of a computer, the power fuctio is plotted. Exercise ht-t > [Keywords] hypothesis tests, ormal populatio, oe-tailed test, populatio stadard deviatio, critical regio, p-value, type I error, type II error, power fuctio. > [Descriptio] A decisio o whether the populatio stadard deviatio of a variable is smaller tha a give umber is made by applyig a oe-tailed test ad lookig at both the critical values ad the p-value. The expressio of the type II error is foud. With the help of a computer, the power fuctio is plotted. Qualitative aalysis o the form of the alterative hypothesis is doe. The assumptio that the populatio variable follows the ormal distributio is ecessary to apply the results for studyig the variace. Exercise 3ht-T > [Keywords] hypothesis tests, ormal populatio, oe- ad two-tailed tests, populatio variace, critical regio, p-value, type I error, type II error, power fuctio. > [Descriptio] The equality of the populatio variace of a variable to a give umber is tested by cosiderig both oe- ad two-tailed alterative hypotheses. Decisios are made after lookig at both the critical values ad the p-value. I the two cases, the expressio of the type II error is foud ad the power fuctio is plotted with the help of a computer. The power fuctios are graphically compared, ad the figure shows that the oe-sided test is uiformly more powerful tha the two-sided test. Exercise ht-t > [Keywords] hypothesis tests, ormal populatio, oe- ad two-tailed tests, populatio variace, critical regio, p-value, type I error, type II error, power fuctio, statistical cook. > [Descriptio] From the hypotheses of a oe-sided test o the populatio variace of a variable, differet ways are qualitatively ad quatitatively cosidered for the opposite decisio to be made. Exercise 5ht-T > [Keywords] hypothesis tests, ormal populatios, oe- ad two-tailed tests, populatio stadard deviatio, critical regio, p-value, type I error, type II error, power fuctio. > [Descriptio] A decisio o whether the populatio stadard deviatio of a variable is equal to a give value is made by applyig three possible alterative hypotheses ad lookig at both the critical values ad the p-value. The type II error is calculated ad the power fuctio is plotted. The power fuctios are graphically compared: the figure shows that the oe-sided tests are uiformly more powerful tha the two-sided test. Exercise 6ht-T > [Keywords] hypothesis tests, Beroulli populatios, oe-tailed tests, populatio proportio, critical regio, p-value, type I error, type II error, power fuctio. > [Descriptio] A decisio o whether the populatio proportio is higher i oe populatio is made after allocatig this iequality i the ull hypothesis, firstly, ad the alterative hypothesis, secodly. Two methodologies are cosidered, oe based o the critical values ad the other based o the p-value. I both tests, the type II error is calculated ad the power fuctio is plotted. The Solved Exercises ad Problems of Statistical Iferece

symmetry of the power fuctios of the two cases is highlighted. As a advaced sectio, the pooled sample proportio is used to estimate the variace of the populatios i the deomiator of the statistic, but ot to estimate the differece betwee the populatio proportios i the umerator of the statistic. Based o Λ Exercise ht-λ > [Keywords] hypothesis tests, Neyma-Pearso's lemma, likelihood ratio test, critical regio, Poisso distributio, expoetial distributio, Beroulli distributio, ormal distributio. > [Descriptio] The critical regio is theoretically studied for the ull hypothesis that a parameter of the distributio equals a give value agaist four differet alterative hypothesis. The form of the regio is related to the maximum likelihood of the estimator. Aalysis of Variace ANOVA Exercise ht-av > [Keywords] hypothesis tests, ormal populatios, aalysis of variace, critical regio, p-value, type I error, type II error. > [Descriptio] The aalysis of variace is applied to test whether the meas of three idepedet ormal populatios whose variaces are supposed to be equal are the same. Calculatios are repeated three times with differet levels of maual work. Noparametric Exercise ht-p > [Keywords] hypothesis tests, chi-square tests, idepedece tests, critical regio, p-value, type I error, table of frequecies. > [Descriptio] The idepedece betwee two qualitative variables or factors is tested by applyig the chi-square statistic. Exercise ht-p > [Keywords] hypothesis tests, chi-square tests, goodess-of-fit tests, critical regio, p-value, type I error, table of frequecies. > [Descriptio] The goodess-of-fit to the whole Poisso family, firsly, ad to a member of the Poisso distributio family, secodly, is tested by applyig the chi-square statistic. The importace of usig the sample iformatio, istead of poorly justified assumptios, is highlighted whe the results of both sectios are compared. Exercise 3ht-p > [Keywords] hypothesis tests, chi-square tests, goodess-of-fit tests, idepedece tests, homogeeity tests, critical regio, p-value, type I error, table of frequecies. > [Descriptio] Just the same table of frequecies is looked at as comig from three differet scearios. Chi-square goodess-of-fit, idepedece ad homogeeity tests are respectively applied. Parametric ad Noparametric Exercise ht > [Keywords] hypothesis tests, Beroulli distributio, goodess-of-fit chi-square test, positio sigs test, critical regio, p-value, type I error, type II error, power fuctio, table of frequecies. > [Descriptio] Just the same problem is dealt with by cosiderig three differet approaches: oe parametric test ad two kids of oparametric test. I this case, the same decisio is made. PE CI HT Exercise pe-ci-ht > [Keywords] poit estimatios, cofidece itervals, method of the pivot, ormal distributio, t distributio, pooled sample variace. > [Descriptio] The probability of a evet ivolvig the differece betwee the meas of two idepedet ormal populatios is calculated with ad without the suppositio that the variaces of the populatios are the same. The method of the pivot is applied to costruct a cofidece iterval for the quotiet of the stadard deviatios. Exercise pe-ci-ht > [Keywords] cofidece itervals, poit estimatios, ormal distributio, method of the pivot, probability, pooled sample variace. > [Descriptio] For the differece of the meas of two idepedet ormally distributed variables, a cofidece iterval is costructed by applyig the method of the pivotal quatity. Sice the equality of the meas is icluded i a high-cofidece iterval, the pooled sample variace is cosidered i calculatig a probability ivolvig the differece of the sample meas. Exercise 3pe-ci-ht > [Keywords] hypothesis tests, cofidece itervals, Beroulli populatios, oe-tailed tests, populatio proportio, critical regio, p-value, type I error, type II error, power fuctio, method of the pivot. > [Descriptio] A decisio o whether the populatio proportio is smaller or equal i oe populatio tha i the other is made lookig at both the critical values ad the p-value. The type II error is calculated ad the power fuctio is plotted. By applyig the method of the pivot, a cofidece iterval for the differece of the populatio proportios is built. This iterval ca be see as the acceptace regio of the equivalet two-sided hypothesis test. I this case, the same decisio is made with the test ad with the iterval. Exercise pe-ci-ht > [Keywords] poit estimatios, hypothesis tests, stadard power fuctio desity, method of the momets, maximum likelihood method, plug-i priciple, Neyma-Pearso's lemma, likelihood ratio tests, critical regio. > [Descriptio] Give the probability fuctio of a populatio radom variable, estimators are built by applyig both the method of the momets ad the maximum likelihood method. The, the plug-i priciple allows us to obtai estimators for the mea ad the variace of the distributio of the variable. I testig the equality of the parameter to a give value, the form of the critical regio is theoretically studied whe four differet types of alterative hypothesis are cosidered. Additioal Exercises 5 Solved but ot ordered by difficulty, described or referred to i the fial idex. Solved Exercises ad Problems of Statistical Iferece

Appedixes Probability Theory PT Some Remiders Markov's Iequality. Chebyshev's Iequality Probability ad Momets Geeratig Fuctios. Characteristic Fuctio. Exercise pt > [Keywords] probability, quatile, probability tables, probability fuctio, biomial distributio, Poisso distributio, uiform distributio, ormal distributio, chi-square distributio, t distributio, F distributio. > [Descriptio] For each of these distributios, the probability of a simple evet is calculated both by usig probability tables ad by usig the mass fuctio, or, o the cotrary, a quatile is foud by usig the probability tables or a statistical software program. Exercise pt > [Keywords] probability, ormal distributio, total sum, sample mea, completio stadardizatio. > [Descriptio] For a quatity that follows the ormal distributio with kow parameters, the probability of a evet ivolvig the quatity is calculated after properly completig the two sides of the iequality, that is, after properly rewritig the evet. Exercise 3pt * > [Keywords] probability, Beroulli distributio, biomial distributio, geometric distributio, Poisso distributio, expoetial distributio, ormal distributio, raw or crude populatio momets, series, itegral, probability geeratig fuctio, momet geeratig fuctio, characteristic fuctio, differetial equatio, itegral equatio, complex aalysis. > [Descriptio] For the distributios metioed, the first two raw or crude populatio momets are calculated by usig as may ways as possible. Their level of difficulty is differet, but the aim is to practice. Some calculatios require strog mathematical justificatios. Several iterested aalytical techiques are used: chagig the order of summatio i series, usig Taylor series, characterizig a fuctio through a differetial or itegral equatio, et cetera. Mathematics M Some Remiders Limits Exercise m * > [Keywords] real aalysis, itegral, expoetial fuctio, bid, Fubii's theorem, itegratio by substitutio, multiple itegrals, polar coordiates. > [Descriptio] It is well-kow that the fuctio exp x has o atiderivative. The defiite itegral is calculated i three cases that appear frequetly, e.g. whe workig with the desity fuctio of the ormal or the Rayleigh distributios. By applyig the Fubii's theorem for improper itegrals, calculatios are traslated to the two-dimesioal real space, where polar coordiates are used to solve the multiple itegral easily. Exercise m > [Keywords] real aalysis, limits, sequece, idetermiate forms > [Descriptio] Several limits of oe-variable sequeces, similar to those ecessary for other exercises, are calculated. Exercise 3m * > [Keywords] real aalysis, limits, sequece, idetermiate forms, polar coordiates. > [Descriptio] Several limits of two-variable sequeces, similar to those ecessary for other exercises, are calculated. Exercise m * > [Keywords] algebra, geometry, real aalysis, liear trasformatio, rotatio, movemet, frotier, rectagular coordiates. > [Descriptio] Several approaches are used to fid the frotier ad the regios determied by a discrete relatio i the plai. Refereces Tables of Statistics T > [Keywords] estimators, statistics T, parametric tests, likelihood ratio, aalysis of Variace ANOVA, oparametric tests, chi-square tests, Kolmogorov Smirov tests, rus test of radomess, sigs test of positio, Wilcoxo siged-rak test of positio. > [Descriptio] The statistics applied i the exercises are tabulated i this appedix. Some theoretical remarks are icluded. Probability Tables P > [Keywords] ormal distributio, t distributio, chi-square distributio, F distributio. > [Descriptio] A probability table with the most frequetly used values is icluded for each of the distributios abovemetioed. Idex 6 Solved Exercises ad Problems of Statistical Iferece

Iferece Theory [IT] Framework ad Scope of the Methods Populatios [Ap] Whe the etire populatios ca be studied, o iferece is eeded. Thus, here we suppose that we have ot such total kowledge. [Ap] Populatios will be supposed to be idepedet matched or paired data must be treated i a slightly differet way. Samples [As] Sample sizes are supposed to be quite smaller tha populatio sizes a correctio factor is ot ecessary for these closely ifiite populatios. [As] At the same time, we cosider either ay amout of ormally distributed data or may data large samples from ay distributio. [As3] Data will be supposed to have bee selected radomly, with the same probability ad idepedetly; that is, by applyig simple radom samplig. Methods [Am] Before applyig iferetial methods, data should be aalysed to guaratee that othig strage will spoil the iferece we suppose that such descriptive aalysis ad data treatmet have bee doe. [Am] We are able to lear oly liearly, but i practice methods eed ot be applied i the order i which they are preseted here e.g. oparametric hypothesis tests to check assumptios before applyig parametric methods. [IT] Some Remarks Partial Kowledge ad Radomess The partial kowledge metioed i the previous sectio has crucial cosequeces. The use of oly some elemets of the populatio implies that we ca oly hypothesized about the other elemets variables must be assiged a radom character, o the oe had, ad results will have o total certaity i the sese that statemets will be set with some probability, o the other had. For example: a 95% cofidece i applyig a method must be iterpreted as ay other probability: the results are true with probability 0.95 ad false with probability 0.95 frequetly, we will ever kow if the method has failed or ot. See remark pt, i the appedix of Probability Theory, o the iterpretatio of the cocept of probability. I Probability Theory, radom variables are dimesioless quatities; i real-life problems, variables almost always are ot. Sice usually this fact does ot cause troubles i Statistics, we do ot pay much attetio to the uits of measuremet, ad we ca uderstad that the magitude of the real-life variable, with o uit of measuremet, is the part that is beig modeled by usig the proper probability distributio with the proper parameter values of course, uits of measuremet are ot radom. To get used to pay attetio to the uits of measuremet ad to maage them, they have bee writte i most umerical expressios. 7 Solved Exercises ad Problems of Statistical Iferece

Regardig the iterpretatio of the whole statistical processes that we will apply either to practice their use or to solve particular real-world problems, we highlight the mai poits o which results are usually based: i ii iii iv Assumptios. The method applied, icludig particular details of its steps, mathematical theorems, statistic T, etc. Certaity with which the method is applied: probability, cofidece or sigificace. The data available. I Statistics, results may chage severely whe assumptios are really false, other method is applied, differet certaity is cosidered, or data has o proper iformatio quatity, quality, represetativity, etc.. Alogside this documet, we do isist o the cautios that statisticias ad reader of statistical works must take i iterpretig results. Eve if you are ot iterested i statistically cookig data, you had better kow the recipes... Some of them have bee icluded i the otes metioed i the prologue. Use of the Samples Let,..., be the data of a populatio. The iformatio they cotai is extracted ad used through appropriate mathematical fuctios: estimators ad statistics. Whe applyig the methods, sice we usually eed to calculate a probability or to fid a quatile, expressios must be writte i terms of those appropriate quatities whose samplig distributio is kow. I tryig to make estimators or statistics appear, some Mathematics are eeded. We do ot repeat them wheever they are applied i this documet. For example, the stadardizatio is a strictly positive trasformatio that does ot chage iequalities whe it is applied to both sides, or the positive brach of the square root must be cosidered to work with populatio or sample variaces ad stadard deviatios this cocepts are oegative by defiitio, while the square root is a geeral mathematical tool applied to this particular situatio. As a example of those mathematical explaatios ot repeated agai ad agai, we iclude the followig: Remark: Sice variaces are oegative by defiitio ad the positive brach of the square root fuctio is strictly icreasig, it holds that σx σy σx σy similarly for iequalities. For geeral umbers a ad b, it holds oly that a b a b. From a strict mathematical poit of view, for the stadard deviatio we should write σ σ σ. Fially, at the ed of the possible theoretical part of exercises, we do ot isist that a sample,..., would i practice be used by eterig its values i the theoretical expressios obtaied as a solutio. Estimators ad statistics are radom quatities util specific data are used. Useful Questios To make the aswer, users ca fid it useful to ask themselves: O the Populatios How may populatios are there? Are their probability distributios kow? O the Samples If populatios are ot ormally distributed, are the sample sizes large eough to apply asymptotic results? Do we kow the data themselves, or oly some quatities calculated from them? 8 Solved Exercises ad Problems of Statistical Iferece

O the Assumptios What is supposed to be true? Does it seem reasoable? Do we eed to prove it? Should it be checked for the populatios: the radom character, the idepedece of the populatios, the goodess-of-fit to the supposed models, the homogeeity betwee the populatios, et cetera? Should it be checked for the samples: the withi-sample radomess ad idepedece, the betwee-samples idepedece, et cetera? Are there other assumptios either mathematical or statistical? O the Statistical Problem What are the quatities to be studied statistically? Cocretely, what is the statistical problem: poit estimatio, cofidece iterval, hypothesis test, etc? O the Statistical Tools Which are the estimators, the statistics ad the methods that will be applied? O the Quatities Which are the uits of measuremet? Are all the uits equal? How large are the magitudes? Do they seem reasoable? Are all of them coheret variability is positive, probabilities ad relative frequecies are betwee 0 ad, etc? O the Iterpretatio What is the statistical iterpretatio of the solutio? How is the statistical solutio iterpreted i the framework of the problem we are workig o? Do the qualitative results seem reasoable as expected? Do the quatities seem reasoable sigs, order of magitude, etc? They may wat to cosult some other pieces of advice that we have writte i Guide for Studets of Statistics, available at http://www.casado-d.org/edu/guideforstudetsofstatistics-slides.pdf. [IT] Samplig Probability Distributio Remark it: The otatio ad the expressio of the most basic estimators, for oe populatio, are i i V j μ j s η i i j j S j j For two populatios, other basic estimators are made with these: V V s s S S η η Fially, all these estimators are used to make statistics whose samplig distributio is kow. Exercise it-spd Give a populatio variable followig the probability distributio determied by the followig values ad 9 Solved Exercises ad Problems of Statistical Iferece

probabilities Value x 3 Probability p 3 9 9 5 9 Determie: a The joit probability distributio of the sample, b The samplig probability distributio of the sample mea Based o a exercise of the materials i Spaish prepared by my workmates. Discussio: The distributio of is totally determied, sice we kow all the iformatio ecessary to calculate ay quatity e.g. the mea: 3 5 8 μ E Ω x j P x j {,,3 } x j p j 3. 9 9 9 9 Istead of a table, a fuctio is sometimes used to provide the values ad the probabilities the mass or desity fuctio. We ca represet this fuctio with the computer: values c,, 3 probabilities c3/9, /9, 5/9 plotvalues, probabilities, type'h', xlab'value', ylab'probability', ylimc0,, mai 'Mass Fuctio', lwd7 The samplig probability distributio of is determied oce we give the possible values ad the probabilities with which they ca be take. Before doig that, we describe the probability distributio of the radom vector,. A Joit probability distributio of the sample Sice j are idepedet i ay simple radom sample, the probability that, takes the value x,, for example, is calculated as follows ote the itersectio: 3 3 f,p, P { } { }P P 9 9 9 To fill i the followig table, the other probabilities are calculate i the same way. Joit Probability Distributio of, Value x,x,,,3,,,3 3, 3, 3,3 Probability of x,x 3 3 9 9 3 9 9 3 5 9 9 3 9 9 9 9 5 9 9 5 3 9 9 5 9 9 5 5 9 9 9 7 5 7 7 8 5 8 5 7 5 8 5 8 Notice that,3 ad 3,, for example, cotai the same iformatio. The values ad their probabilities ca 0 Solved Exercises ad Problems of Statistical Iferece

be give by extesio table or figure or by comprehesio fuctio. ## Istall this package if you do't have it ru the followig lie without # # istall.packages'scatterplot3d' values c,,,,,, 3, 3, 3 values c,, 3,,, 3,,, 3 probabilities c/9, /7, 5/7, /7, /8, 5/8, 5/7, 5/8, 5/8 library'scatterplot3d' # To load the package scatterplot3dvalues, values, probabilities, type'h', xlab'value ', ylab'value ', zlab'probability', xlimc0,, ylimc0,, zlimc0,, mai 'Mass Fuctio', lwd7 That the total sum of probabilities is equal to oe ca be checked: 5 5 5 5 5 93535555 8 Ω f x j Ω p j 9 7 7 7 8 8 7 8 8 8 8 From the iformatio i the table it is possible to calculate ay quatity e.g. the first-order joit momet: 5 5, μ E Ω x j f x j 3 3 3.9387 9 7 8 8 B Samplig probability distributio of the sample mea The sample mea, is a radom quatity, sice so are ad. Each pair of values x,x of, gives a value x for ; o the cotrary, a value x of ca correspod to differet pairs of values x,x. The, we will fill i a table with all values ad merge those that are equal. For example:, The other values x of are calculate i the same way to fill i the followig table: Value x,x,,,3,,,3 3, 3, 3,3 Probability of x,x 9 7 5 7 7 8 5 8 5 7 5 8 5 8 Value x of 3 3 3 3 3 3 3 3 5 5 3 The sample mea ca take five differet values while, could take ie differet possible values x,x. Thus, the probability for to take the value, for example, is calculated as follows ote the uio: 5 5 3 P P {,3 } {, } {3,}P,3 P,P 3, 7 8 7 8 I the same way, P P {,} 9 Solved Exercises ad Problems of Statistical Iferece

3 P {,} {,}P {,} P {, } 7 7 7 5 5 5 0 P P {,3 } {3,}P {,3} P { 3,} 8 8 8 P P 3 P {3,3} 5 8 The, the samplig probability distributio of the sample mea is determied, i this case, by Probability Distributio of Value x 3 5 3 Probability of x 9 7 3 8 0 8 5 8 We ca check that the total sum of probabilities is equal to oe: 3 0 5 9630 5 8 Ω P x j Ω p j 9 7 8 8 8 8 8 From the iformatio i the table above it is possible to calculate ay quatity e.g. the mea: 3 3 5 0 5 996575 μ E Ω x j P x j Ω x j p j 3. 9 7 8 8 8 8 It is worth oticig that this value is equal to the value that we obtaied at the begiig, which agrees with the well-kow theoretical property: μ E E μ Values ad probabilities ca also be provided by usig a fuctio the mass or desity fuctio, which ca be represeted with the help of a computer: values c, 3/,, 5/, 3 probabilities c/9, /7, 3/8, 0/8, 5/8 plotvalues, probabilities, type'h', xlab'value', ylab'probability', ylimc0,, mai 'Mass Fuctio', lwd7 Coclusio: For a simple distributio for ad a small sample size,, we have writte both the joit probability distributio of the sample ad the samplig distributio of. This helps us to uderstad the cocept of samplig distributio of ay radom quatity ot oly the sample mea, whether we are able to write it or eve to kow it e.g. due to a theorem. My otes: Solved Exercises ad Problems of Statistical Iferece

Poit Estimatios [PE] Methods for Estimatig Remark pe: Whe ecessary, the expectatios E ad E are usually give i the statemet; oce E is give, either Var or E ca equivaletly be give, sice Var E E. If ot give, these expectatios ca be calculated from their defiitios by addig up to or itegratig, for discrete ad cotiuous variables, respectively this is sometimes a advaced mathematical exercise. Remark pe: If the method of the momets is used to estimate m parameters frequetly or, the first m equatios of the system usually suffice; evertheless, if ot all the parameters appear i the first-order momets of, the smallest m momets ad equatios for which the parameters appear must be cosidered. For example, if μ 0 or if the iterest relies directly o σ because μ is kow, the first-order equatio μ μ E m does ot ivolve σ ad hece the secod-order equatio μ E Var E σμ m must be cosidered istead. Remark 3pe: Whe lookig for local maxima or miima of differetiable fuctios, the first-order derivatives are equalized to zero. After that, to discrimiate betwee maxima ad miima, the secod-order derivatives are studied. For most of the fuctios we will work with, this secod step ca be solved by applyig some qualitative reasoig o the sig of the quatities ivolved ad the possible values of the data xi. Whe this does ot suffice, the values foud i the first step, say θ0, must be substituted i the expressio of the secod step. O the other had, global maxima ad miima caot i geeral be foud usig the derivatives, ad some qualitative reasoig must be applied. It is importat to highlight that, i applyig the maximum likelihood method, the purpose is to fid the maximum, whichever the mathematical way. Exercise pe-m If is a populatio variable that follows a biomial distributio of parameters κ ad η, ad,..., is a simple radom sample: a Apply the method of the momets to obtai a estimator of the parameter η. b Apply the maximum likelihood method to obtai a estimator of the parameter η. c Whe κ 0 ad x x,...,x5,, 3, 5, 6, use the estimators obtaied i the two previous sectios to costruct fial estimates of the parameter η ad the measures μ ad σ. Hit: i I the two first sectios treat the parameter κ as if it were kow. ii I the likelihood fuctio, joi the combiatorial terms ito a product; this product does ot deped o the parameter η ad hece its derivative will be zero. Discussio: This statemet is mathematical, although i the last sectio we are give some data to be substituted. I practice, that the biomial ca be used to explai should be supported. The variable is dimesioless. For the biomial distributio, See the appedixes to see how the mea ad the variace of this distributio ca be calculated. Particularly, the results obtaied here ca be applied to the Beroulli distributio with κ. a Method of the momets a Populatio ad sample momets: The probability distributio has two parameters origially, but we have to study oly oe. The first-order momets are ad μ ηe κ η m x, x,..., x j x j x a System of equatios: Sice the parameter of iterest η appears i the first-order populatio momet of 3 Solved Exercises ad Problems of Statistical Iferece

, the first equatio is eough to apply the method: μ ηm x, x,..., x a3 The estimator: κ η x x j j η κ x η^ M κ b Maximum likelihood method b Likelihood fuctio: For the biomial distributio the mass fuctio is f x ; κ, η κ ηx η κ x. x We are iterested oly i η, so x κ x x κ x x κ x L x, x,..., x ; η j f x j ; η j κ η η κ η η κ η η xj x x [ j xκj ] η j xj η j j j κ x j [ ] j xκj η j xj κ j x j η. b Optimizatio problem: The logarithm fuctio is applied to facilitate the calculatios, log [ L x, x,..., x ; η]log log [ ] [ j ] κ log [ η xj j xj ]log [ η κ j x j ] j xκj j x j logη κ j x j log η. To discover the local or relative extreme values, the ecessary coditio is 0 d log[ L x, x,..., x ;η]0 j x j κ j x j dη η η κ j x j η j x j η η κ η j x j j x j η j x j η κ j x j η0 j x j κ κ j x j κ x To verify that the oly cadidate is a local or relative maximum, the sufficiet coditio is d log[ L x, x,..., x ;η] j x j κ j x j dη η η j x j η κ j x j η <0 sice κ xj ad therefore κ j x j κ j x j 0. This holds for ay value, icludig η0. b3 The estimator: η^ ML κ c Estimatio of η, μ ad σ For κ 0 ad x x,...,x5,, 3, 5, 6 3560.. From the method of the momets: η M κ x 0 5 From the maximum likelihood method, as the same estimator was obtaied: η ML 0.. Sice μe κ η, a estimator of η iduces a estimator of μ by applyig the plug-i priciple: Solved Exercises ad Problems of Statistical Iferece

From the method of the momets: μ^ M κ ^ηm.. From the maximum likelihood method: μ^ ML.. Fially, σ Var κ η η, a estimator of η iduces a estimator of μ too: ^ M 0 0. 0..6. From the method of the momets: σ^ M κ η^ M η ^ ML.6. From the maximum likelihood method: σ^ ML κ η^ ML η Coclusio: We ca see that for the biomial populatio the two methods provide the same estimator for η. The value of κ must be kow to use the expressio obtaied. I this particular case, the value 0. idicates that, for each uderlyig trials Beroulli variables, oe value seems more probable tha the other. O the other had, the quality of the estimator obtaied should be studied, especially if the two methods had provided differet estimators. As a particular case, κ for the Beroulli distributio. My otes: Exercise pe-m A radom quatity is supposed to follow a geometric distributio. Let be a simple radom sample. A Apply the method of the momets to fid a estimator of the parameter η. B Apply the maximum likelihood method to fid a estimator of the parameter η. 7 C Give a sample such that j x j 3, apply the formulas obtaied i the two previous sectios to give fial estimates of η. Fially, give estimates of the mea ad the variace of. Discussio: This statemet is mathematical, although we are give some data i the last sectio. The radom variable is dimesioless. For the geometric distributio, See the appedixes to see how the mea ad the variace of this distributio ca be calculated. A Method of the momets a Populatio ad sample momets: The populatio distributio has oly oe parameter, so oe equatio suffices. The first-order momets of the model ad the sample x are, respectively, ad μ ηe η m x, x,..., x j x j x a System of equatios: Sice the parameter of iterest η appears i the first-order momet of, the first equatio suffices: μ ηm x, x,..., x j x j x η j x j x η a3 The estimator: η^ M 5 j j Solved Exercises ad Problems of Statistical Iferece

B Maximum likelihood method b Likelihood fuctio: For the geometric distributio, the mass fuctio is f x ; ηη η x so j x j L x, x,..., x ; η j f x j ; ηη η x η ηx η η b Optimizatio problem: The logarithm fuctio is applied to make calculatios easier j x j log[ L x, x,..., x ; η]log [η ] log [ η ] log η[ j x j ] log η The populatio distributio has oly oe parameter, so a oedimesioal fuctio must be maximized. To fid the local or relative extreme values, the ecessary coditio is: d 0 log [ L x, x,..., x ; η] η [ j x j ] dη η [ j x j ] η η ηη j x j η η j x j η0 j x j x To verify that the oly cadidate is a local maximum, the sufficiet coditio is: d log[ L x, x,..., x ; η] [ j x j ] <0 dη η η as j x j > 0 ote that xj. This holds for ay value, icludig η0. x b3 The estimator: η^ ML j j C Estimatio of η, μ, ad σ Sice 7 ad 7 j x j 3, 7 From the method of the momets: η^ M 0.0. 7 x 3 j x j 7 From the maximum likelihood method, as the same estimator was obtaied: η ML 0.0. Sice μe η, a estimator of η iduces a estimator of μ: 3.96. η^ M 7 From the maximum likelihood method, sice the same estimator was obtaied: μ^ ML.96. From the method of the momets: μ^ M Note: From the umerical poit of view, calculatig 3/7 is expected to have smaller error tha calculatig /0.0. Fially, sice σ Var η, η 6 Solved Exercises ad Problems of Statistical Iferece

7 3 3 73 From the method of the momets: σ^ M 9.67. η ^M 7 7 3 From the maximum likelihood method: σ^ ML 9.67. η^ M Note: From the umerical poit of view, calculatig 3/7 is expected to have smaller error tha calculatig /0.0. Coclusio: For the geometric model, the two methods provide the same estimator for η. We have used the estimator of η to obtai a estimator of μ. O the other had, the quality of the estimator obtaied should be studied, especially if the two methods had provided differet estimators. My otes: Exercise 3pe-m A real-world variable is modeled by usig a radom variable that follows a Poisso distributio. Give a simple radom sample of size, A Apply the method of the momets to obtai a estimator of the parameter λ. B Apply the maximum likelihood method to obtai a estimator of the parameter λ. C Use these estimators to build estimators of the mea μ ad the variace σ of the distributio. Discussio: Although a real-world populatio is metioed, this statemet is mathematical. It is implicitly assumed that the Poisso model is appropriate to study that variable it ca be supposed to be dimesioless. I a statistical study, this suppositio should be evaluated, e.g. by applyig a hypothesis test, before lookig for a estimator of the populatio parameter. For a Poisso radom variable, See the appedixes to see how the mea ad the variace of this distributio ca be calculated. A Method of the momets a Populatio ad sample momets: The populatio distributio has oly oe parameter, so oe equatio suffices. The first-order momets of the model ad the sample x are, respectively, ad μ λ E λ m x, x,..., x j x j x a System of equatios: Sice the parameter of iterest λ appears i the first-order momet of, the first equatio suffices. The system has oly oe trivial equatio: μ λm x, x,..., x λ j x j x a3 The estimator: λ^ M j j 7 Solved Exercises ad Problems of Statistical Iferece

B Maximum likelihood method b Likelihood fuctio: We write the product ad reorder the terms that are similar: xj x x x j x j L x, x,..., x ; λ j f x j ; λ j λ e λ e λ e λ e λ xj! x! x! x! λ λ λ λ j x j! e λ b Optimizatio problem: The logarithm fuctio is applied to make calculatios easier: log [ L x, x,..., x ; λ]log[λ j x j ]log[e λ ] log[ j x j! ] j x j log [λ] λ log[ j x j! ] The populatio distributio has oly oe parameter, so a oedimesioal fuctio must be maximized. To fid the local extreme values the ecessary coditio is: d 0 log[ L x, x,..., x ; λ] j x j λ λ 0 j x j x dλ To verify that the oly cadidate is a local maximum, the sufficiet coditio is: d log[ Lx, x,..., x ; λ ] x <0 j j dλ λ sice x {0,,...} j x j 0. The, the secod derivative is always egative, also for λ 0. b3 The estimator: For λ, it is obtaied after substitutig the lower-case letters xj umbers represetig THE sample we have by upper-case letters j radom variables represetig AN possible sample we may have: λ^ ML j j C Estimatio of μ ad σ To obtai estimators of the mea ad the variace, we take ito accout that for this model μe λ ad σ Var λ, so by applyig the plug-i priciple: μ^ λ^, σ^ λ^ Coclusio: For the Poisso model, the two methods provide the same estimator for λ, ad therefore for μ ad σ whe the plug-i priciple is applied. O the other had, the quality of the estimator obtaied should be studied though the sample mea is a well-kow estimator. My otes: Exercise pe-m A radom variable follows the ormal distributio. Let,..., be a simple radom sample of see as the populatio. To obtai a estimator of the parameters θ μ,σ, apply: A The method of the momets B The maximum likelihood method Discussio: This statemet is mathematical. For the ormal distributio, 8 Solved Exercises ad Problems of Statistical Iferece

For this distributio, the mea ad the variace are directly μ ad σ; this is proved i the appedixes. A Method of the momets a Populatio ad sample momets The populatio distributio has two parameters, so two equatios are cosidered. The first-order momets are ad μ μ, σ E μ m x, x,..., x j x j x while the secod-order momets are ad m x, x,..., x j x j μ μ,σ E Var E σ μ a System of equatios { μ μ, σm x, x,..., x μ μ, σm x, x,..., x { where Var E E ad s x x x j j σ μ j x j { μ μ x σ j x j x s x x x x x have bee used. j j j j a3 The estimator { θ^ M μ^ M σ^ M s B Maximum likelihood method b Likelihood fuctio The desity fuctio of the Gaussia distributio is f x ; μ, σ e π σ L x, x,..., x ;μ, σ j f x j ;μ,σ j e π σ x j μ σ x μ σ. The, e σ π σ j x j μ b Optimizatio problem Logarithm: The logarithm fuctio is applied to make calculatios easier log[ L x, x,..., x ;μ, σ] log[ π σ ] j x j μ σ Maximum: The populatio distributio has two parameters, ad the it is ecessary to maximize a twodimesioal fuctio. To discover the local extreme values, the ecessary coditios are: { μ log[ L x, x,..., x ;μ, σ]0 σ log [ L x, x,..., x ; μ, σ]0 9 { [ x j μ ]0 j σ σ π [ j x j μ ] 0 σ π σ Solved Exercises ad Problems of Statistical Iferece

{ x j μ0 σ j σ 3 j x j μ 0 σ { μ σ x j j { j x j μ0 j x j μ 0 σ x j μ j { { j x j μ σ μ x σ j x j x s x j x jμ { μ x σs x To verify that the oly cadidate is a local maximum, the sufficiet coditios o the partial derivatives of secod order are: [ ] A log [ L x,..., x ;μ, σ] x μ j j j μ μ σ σ σ B [ ] log [ L x,..., x ; μ, σ] x μ σ x μ x μ j j 3 j j μ σ σ σ j j σ σ [ ] 3 C log [ L x,..., x ;μ, σ] σ 3 j x j μ j x j μ σ σ σ σ σ To calculate D B AC, substitutig the pair μ, σ x, s x i A, B ad C simplifies the work A x, s x < 0 sx B μ, s 3 j x j x 0 sx x C x, s x as x < 0 s x sx s x D x, s 3 j x j μ s x sx sx j x j x j x j x 0 ad j x j x j x j x s x. The, log[ L x ; μ, σ] has a maximum at μ,σ x, s x sice it is a local extreme value ad D < 0, A < 0. b3 The estimator { θ ML μ ML σ ML s Coclusio: Sice i this case there are two parameters, both the parameter ad its estimator ca be thought μ, σ. as twodimesioal quatities: θμ, σ ad θ O the other had, the quality of the estimator obtaied should be studied, especially if the two methods had provided differet estimators. My otes: 0 Solved Exercises ad Problems of Statistical Iferece

Exercise 5pe-m The uiform distributio U[0,θ] has f x ; θ { θ if x [0,θ ] 0 otherwise as a desity fuctio. Let,..., be a simple radom sample of a populatio followig this probability distributio. A Apply the method of the momets to fid a estimator of the parameter θ. B Apply the maximum likelihood method to fid a estimator of the parameter θ. Use this estimator to build others for the mea ad the variace of. Discussio: This statemet is mathematical, ad there is o suppositio that would require justificatio. The radom variable is dimesioless. We are give the desity fuctio of the distributio of, though for this distributio it could be deduced from the fact that all values have the same probability. For the geeral cotiuous uiform distributio, Note: If we had ot remembered the first populatio momets, with the otatio of this exercise we could do E θ θ E x f x ; θ dx 0 [ ] [ ] x x f x ; θ dx 0 x θ dx θ θ θ 0 θ 0 θ θ x3 3 x θ dx θ θ θ 0 θ 3 0 3 3 so μe θ ad σ Var E E θ θ θ θ 3 3 A Method of the momets a Populatio ad sample momets: For uiform distributios, discrete or cotiuous, the mea is the middle value. The, the first-order momet of the distributio ad of the sample are 0θ θ ad μ θ m x,..., x j x j x a System of equatios: μ θ m x, x,..., x a3 The estimator: θ x x j j θ0 x x j j θ^ M j j Solved Exercises ad Problems of Statistical Iferece

B Maximum likelihood method b Likelihood fuctio: The desity fuctio is f x ; θ for 0 x θ, so θ L x, x,..., x ; θ j f x j ; θ j θ θ b Optimizatio problem: First, we try to discover the maximum by applyig the techique based o the derivatives. The logarithm fuctio is applied, log[ L x, x,..., x ; θ]log [θ ] logθ, ad the first coditio leads to a useless equatio: d 0 log[ Lx, x,..., x ; θ] θ dθ? The, we realize that global miima ad maxima caot always be foud through the derivatives oly if they are also local extremes. I fact, it is easy to see that the fuctio L mootoically decreases with θ ad therefore mootoically icreases whe θ decreases this patter or just the opposite ted to happe whe the probability fuctio chages mootoically with the parameter, e.g. whe the parameter appears oly oce i the expressio. As a cosequece, it has o local extreme values. Sice, o the other had, 0 x j θ, j, θ {ButL xwhe θ, j θ0 max j { x j } j b3 The estimator: It is obtaied after substitutig the lower-case letters xj umbers represetig THE sample we have by upper-case letters j radom variables represetig AN possible sample we may have: θ^ ML max j { j } C Estimatio of μ ad σ To obtai estimators of the mea, we take ito accout that μe θ ad apply the plug-i priciple: ^θ M ^θ ML max j { j } μ^ M μ^ ML To obtai estimators of the variace, sice σ Var θ θ^ ML max j { j } θ^ M σ^ M σ^ ML 3 Coclusio: For the uiform distributio, both methods provide differet estimators of the parameter ad hece of the mea. The quality of the estimators obtaied should be studied. My otes: Solved Exercises ad Problems of Statistical Iferece

Exercise 6pe-m A radom quatity is supposed to follow a distributio whose probability fuctio is, for θ > 0, f x ; θ { 0 if x <3 x 3 θ θe if x 3 A Apply the method of the momets to fid a estimator of the parameter θ. B Apply the maximum likelihood method to fid a estimator of the parameter θ. C Use the estimators obtaied to build estimators of the mea μ ad the variace σ. Hit: Use that E θ 3 ad E θ 6θ 9. Discussio: This statemet is mathematical. The radom variable is supposed to be dimesioless. The probability fuctio ad the first two momets are give, which is eough to apply the two methods. I the last step, the plug-i priciple will be applied. Note: If E had ot bee give i the statemet, it could have bee calculated by applyig itegratio by parts sice polyomials ad expoetials are fuctios of differet type : [ ] x 3 x 3 x 3 E x f x ; θdx 3 x e θ dx x e θ e θ dx 3 θ [ x e That x 3 θ x 3 3 θ x 3 θ 3 ] [ x θe ] 3θ. θe u x v ' x dxu x v x u ' x v x dx ux x 3 v ' θ e θ has bee used with u ' x 3 x 3 v θ e θ dx e θ O the other had, ex chages faster tha xk for ay k. To calculate E: E x f x ; θdx 3 x 3 3 θ [ ] θ x e Itegratio by parts has bee applied: ux 3 [ x 3 x 3 x 3 x e θ dx x e θ x e θ dx θ ] 3 x 3 x θ e θ dx 3 0 θ μ9 θ 3θθ 6 θ9. u x v ' x dxu x v x u ' x v x dx with u ' x x 3 θ v ' θ e x 3 x 3 v θ e θ dx e θ Agai, ex chages faster tha xk for ay k. A Method of the momets a Populatio ad sample momets: There is oly oe parameter, so oe equatio suffices. The first-order momets of the model ad the sample x are, respectively, ad μ θ E θ 3 m x, x,..., x j x j x a System of equatios: Sice the parameter of iterest η appears i the first-order momet of, the first equatio suffices: 3 Solved Exercises ad Problems of Statistical Iferece

μ θ m x, x,..., x a3 The estimator: θ3 x x j j θ x 3 θ^ M 3 B Maximum likelihood method x 3 b Likelihood fuctio: For this probability distributio, the desity fuctio is f x ; θ θ e θ so L x, x,..., x ; θ j f x j ; θ j x θ 3 θ e θe θ j j x j 3 b Optimizatio problem: The logarithm fuctio is applied to make calculatios easier log [ L x, x,..., x ; θ]log θ j x j 3 log θ j x j 3 θ θ The populatio distributio has oly oe parameter, so a oedimesioal fuctio must be maximized. To fid the local or relative extreme values, the ecessary coditio is: d 0 log [ Lx, x,..., x ; θ] j x j 3 θ θ θ θ j x j 3 dθ θ j x j 3 j x j j 3 x 3 θ0 x 3 To verify that the oly cadidate is a local maximum, the sufficiet coditio is:? d d θ log [ L x, x,..., x ;θ] [ θ j x j 3] j x j 3 < 0 dθ dθ θ θ θ The first term is always positive but the secod is always egative, so we had better substitute the cadidate d θ θ log[ L x, x,..., x ; θ] x 3 0 θ0 < 0 dθ θ θ θ0 θ0 θ0 b3 The estimator: θ^ ML 3 C Estimatio of η ad σ c For the mea: By usig the hit ad the plug-i priciple, From the method of the momets: μ^ M θ^ M 3 33. From the maximum likelihood method, as the same estimator was obtaied: μ^ ML. c For the variace: We must write it i terms of the first two momets of, σ Var E E θ 6 θ 9 θ3θ 6 θ9 θ 6 θ 9θ The, From the method of the momets: σ^ M θ^ M 3 6 9. From the maximum likelihood method: σ^ ML θ^ ML 3 6 9. Coclusio: For this model, the two methods provide the same estimator. We have used the estimator of θ to obtai estimators of μ ad σ. The quality of the estimator obtaied should be studied, especially if the two Solved Exercises ad Problems of Statistical Iferece

methods had provided differet estimators. Regardig the origial probability distributio: i the expressio remids us the expoetial distributio; ii the term x 3 suggests a traslatio; ad iii the variace θ is the same as the variace of the expoetial distributio. After traslatig all possible values x, the mea is also traslated but the variace is ot. Thus, the distributio of the statemet is a traslatio of the expoetial distributio, which has this equivalet otatio x δ I fact, the distributio with probability fuctio f x ; θ θ e θ, x >δ ad zero elsewhere is termed two-parameter expoetial distributio. It is a traslatio of size δ of the usual expoetial distributio. A particular, simple case is obtaied for θ ad δ 0, sice f x e x, x > 0. My otes: Exercise 7pe-m A radom quatity is supposed to follow a distributio whose probability fuctio is, for θ>0, { 3 x 3 f x ; θ θ if 0 x θ 0 otherwise A Apply the method of the momets to fid a estimator of the parameter θ. B Apply the maximum likelihood method to fid a estimator of the parameter θ. C Use the estimators obtaied to build estimators of the mea μ ad the variace σ. Hit: Use that E 3θ/ ad Var 3θ/80. Discussio: This statemet is mathematical. The radom variable is supposed to be dimesioless. The probability fuctio ad the first two momets are give, which is eough to apply the two methods. I the last step, the plug-i priciple will be applied. Note: If E had ot bee give i the statemet, it could have bee calculated by itegratig: θ 3x 3 3 E x f x ;θdx 0 x 3 dx 3 θ θ θ θ 0 [ ] θ O the other had, if Var had ot bee give i the statemet, it could have bee calculated by usig a property ad itegratig: θ E x f x ;θdx 0 θ [ ] 3 x 3 x5 3 x 3 dx 3 θ. 5 5 θ θ 0 Now, 3 μe θ ad 3 3 3 3 3 σ Var E E θ θ θ θ. 5 5 80 A Method of the momets a Populatio ad sample momets: There is oly oe parameter, so oe equatio suffices. The first-order momets of the model ad the sample x are, respectively, 5 Solved Exercises ad Problems of Statistical Iferece

3 μ θ E θ ad m x, x,..., x x x j j a System of equatios: Sice the parameter of iterest η appears i the first-order momet of, the first equatio suffices: 3 μ θ m x, x,..., x θ j x j x θ0 x 3 a3 The estimator: θ^ M 3 B Maximum likelihood method b Likelihood fuctio: For this probability distributio, the desity fuctio is f x ; θ L x, x,..., x ; θ j f x j ; θ j 3x 3 θ so 3 x j 3 x 3 3 j j θ θ b Optimizatio problem: The logarithm fuctio is applied to make calculatios easier log[ L x, x,..., x ; θ]log 3 3 logθ log j x j Now, if we try to fid the maximum by lookig at the first-order derivatives, a useless equatio is obtaied: 0 d log[ Lx, x,..., x ; θ] 3 θ dθ? The, we realize that global miima ad maxima caot i geeral be foud through the derivatives oly if they are also local. It is easy to see that the fuctio L mootoically icreases whe θ decreases this patter or just the opposite ted to happe whe the probability fuctio chages mootoically with the parameter, e.g. whe the parameter appears oly oce i the expressio. As a cosequece, it has o local extreme values. O the other had, 0 x j θ, j, so θ {ButL xwhe θ, j θ0 max j { x j } j b3 The estimator: θ^ ML max j { j } C Estimatio of η ad σ c For the mea: By usig the hit ad the plug-i priciple, 3 3 From the method of the momets: μ^ M θ^ M. 3 3 3 From the maximum likelihood method: μ^ ML θ^ ML max j { j }. c For the variace: By usig that priciple agai, 3 3 From the method of the momets: σ^ M θ^ M. 80 80 3 5 6 Solved Exercises ad Problems of Statistical Iferece

From the maximum likelihood method: σ^ ML 3 ^ 3 θ ML max j { j }. 80 80 Coclusio: For this model, the two methods provide differet estimators. The quality of the estimators obtaied should be studied. We have used the estimator of θ to obtai estimators of μ ad σ. My otes: [PE] Properties of Estimators Remark pe: As regards the sample sizes, we ca talk about static situatios where we study the depedece of the cocepts o the sizes, or the possible relatio betwee the sizes, say c. O the other had, we ca talk about dyamic situatios where the same depedeces are studied asymptotically while the sample sizes are always icreasig, say k ck k, where k is the idex of a sequece of statistical schemes with those sample sizes. Statistically, we are iterested i sequeces with odecreasig sample sizes; mathematically, all possible sequeces should be take ito accout. The static ad the dyamic situatios are respectively represeted i the followig figures: Remark 5pe: We do ot usually use the defiitio of the mea square error but the result at the ed of the followig equalities: ^ ^ ^ ^ E θ θ ^ ^ ^ [ E θ θ ^ ^ Eθ] ^ [ E θ θ ^ MSE θ E [ θ θ] E [ θ E θ ] E [θ E θ] ] [θ ] ^ ^ [ E θ θ] ^ ^ E θ θ] ^ ^ E θ θ ^ ^ b θ ^ E [ θ E θ] E θ [ E θ [ ] Var θ Remark 6pe: To study the cosistecy i probability we have bee taught a sufficiet but ot ecessary coditio that is equivalet to the cosistecy i mea of order two maagig the defiitio is quite complex. Thus, this type of cosistecy is proved whe the coditio is fulfilled, which is sufficiet but ot ecessary for the cosistecy i probability. By usig the Chebyshev's iequality: ^ ^ E θ θ MSE θ ^ P θ θ ϵ ϵ ϵ ^ ϵ lim P θ θ ^ lim MSE θ ϵ If the sufficiet coditio is ot fulfilled, the estimator uder study is ot cosistet i mea of order two, but it ca still be ^ cosistet i probability this type of cosistecy should be studied usig a differet way. Additioally, sice MSE θ, ^ ^ ad Var θ b θ are oegative, the mea square error is zero if ad oly if the other two are zero at the same time, ad viceversa. The same happes for their limits. That is why we are allowed to split the limit of the mea square error ito two limits. Exercise pe-p The efficiecy i lumes per watt, u of light bulbs of a certai type have a populatio mea of 9.5u ad stadard deviatio of 0.5u, accordig to productio specificatios. The specificatios for a room i which eight of these bulbs the simple radom sample are to be istalled call for the average efficiecy of the eight bulbs to exceed 0u. Fid the probability that this specificatio for the room will be met, assumig that efficiecy measuremets are ormally distributed. From Mathematical Statistics with Applicatios, Medehall, W., D.D. Wackerly ad R.L. Scheaffer, Duxbury Press. 7 Solved Exercises ad Problems of Statistical Iferece

Discussio: The suppositio that efficiecy measuremets follow the distributio Nμ9.5u, σ0.5u should be tested by applyig a appropriate statistical techique. The evet is defied i terms of. We thik about makig the proper statistic appear, ad hece to be allowed to use its samplig distributio. Idetificatio of the variable ad selectio of the statistic : The variable is the efficiecy of the light bulbs, while the estimator is the sample mea of eight elemets. Sice the populatio is ormal ad the two populatio parameters are kow, we will cosider the dimesioless statistic: μ T ;μ N 0, σ Rewritig the evet: Although i this case the samplig distributio of is kow, as N μ, σ, we eed to stadardize before cosultig the table of the stadard ormal distributio: > 0P P μ > σ 0 μ σ P T > where i this case the laguage R has bee used: 0 9.5 0.5 8 P T > 0.5 8 P T > 8 0.003 0.5 > - pormsqrt8,0, [] 0.00338867 Coclusio: The productio specificatios will be met, for the room metioed, with a probability of 0.003, that is, they will hardly be met. My otes: Exercise pe-p Whe a productio process is workig properly, the resistace of the compoets follows a ormal distributio with stadard deviatio.68u. A simple radom sample with four compoets is take. What is the probability that the sample quasivariace will be bigger tha 30u? Discussio: I this exercise, the suppositio that the ormal distributio reasoably explais the variable resistace should be evaluated by usig proper statistical techiques. The questio ivolves S. Agai, it is ecessary to make the proper statistic appear, i order to use its samplig distributio. Idetificatio of the variable: R Resistace of oe compoet R ~ Nμ, σ.68u Sample ad statistic: R, R, R3, R The resistace of four compoets is measured. S R R j j Sample quasivariace Search for a kow distributio: The quatity required is PS >30. To calculate the probability of a evet, we eed to kow the distributio of the radom quatity ivolved. I this case, we do ot kow the samplig distributio of S, but sice R follows a ormal distributio we are allowed to use 8 Solved Exercises ad Problems of Statistical Iferece

T S χ σ The, by completig the iequality with the ecessary costats util makig T appear: P S >30P S 30 30 > P T > PT >. σ σ.68 where T χ 3. Multiplyig ad dividig by positive quatities have ot chaged the iequality. Table of the χ distributio: Sice 3, it is ecessary to look at the third row. The probabilities i the table are give for evets of the form PT < x or PT x, as the distributio is cotiuous, ad therefore the complemetary of the evet must be cosidered: PT >. P T. 0.750.5 Coclusio: The probability of the evet is 0.5. This meas that S will sometimes take a value larger tha 30u, whe evaluated at specific data x comig from the metioed distributio. My otes: Exercise 3pe-p A simple radom sample of 70 homes was take from a large populatio of older homes to estimate the proportio of homes with usafe wirig. If, i fact, 0% of homes have usafe wirig, what is the probability that the sample proportio will be betwee 6% ad %? Hit: Sice probabilities ad proportios are measured i a 0-to- scale, write all quatities i this scale. From Statistics for Busiess ad Ecoomics, Newbold, P., W.L. Carlso ad B.M. Thore, Pearso. LINGUISTIC NOTE From: The Careful Writer: A Moder Guide to Eglish Usage. Berstei, T.M. Atheeum home, house. It is a tribute to the uquechable setimetalism of users of Eglish that oe of the matters of usage that seem to agitate them the most is the use of home to desigate a structure desiged for residetial purposes. Their cotetio is that what the builder erects is a house ad that the occupats the fashio it ito a home. That is, or at least was, basically true, but the distictio has become blurred. Nor is this solely the doig of the real estate operators. They do, ideed, lure prospective buyers ot with the thought of mere masory but with glowig picture of comfort, cogeiality, ad family collectivity that make a house ito a home. But the prospective buyers are their co-cospirators; they, too, view the premises ot as a heap of stoe ad wood but as a potetial abode. There may be areas i which the words are ot used iterchageably. I legal or quasi-legal termiology we speak of a house ad lot, ot a home ad lot. The police ad fire departmets usually speak of a robbery or a fire i a house, ot a home, at Mai Street ad First Aveue. Ad the idividual most ofte buys a home, but sells his house there, apparetly, speaks setimet agai. But i most areas the distictio betwee the words has become obfuscated. Whe a flood or a fire destroys a commuity, it wipes out ot merely houses but homes as well, ad homes has come to be accepted i this sese. No oe would discourage the setimetalists from tryig to pry the two words apart, but it would be rash to predict much success for them. Discussio: The iformatio of this real-world study must be traslated ito the mathematical laguage. Sice there are two possible situatios, each home ca be modeled by usig a Beroulli variable. Although 9 Solved Exercises ad Problems of Statistical Iferece

give i a 0-to-00 scale, the populatio ad sample proportios always i a 0-to- scale are ivolved. The dimesioless character of a proportio is due to its defiitio. Note that if the data x,...,x are take ad we have access to them, there is othig radom ay loger. The lack of kowledge, as if we had to select elemets to build,...,, justifies the use of Probability Theory. Idetificatio of the variable ad selectio of the statistic : The variable havig usafe wirig ca take two possible values: 0 ot havig usafe wirig ad havig it, if oe wat to register or cout this fact. The theoretical proportio of older homes with usafe wirig is kow: η 0.0 0%. For this framework a large sample from a Beroulli populatio with parameter η we select the dimesioless, asympotic statistic: d η η T ; η N 0,?? Here we kow η. where? is substituted by the best iformatio available about the parameter: η or η. Rewritig the evet: We are asked for the probability P 0.6 < η < 0., but to calculate it we eed to rewrite the evet util makig T appear: P 0.6 < η < 0.P P T < 0.6 η η η < 0. 0.0 0.0 0.0 70 η η η η P T < 0. η η η 0.6 0.0 0.0 0.0 70 PT <.6 P T.6 I these calculatios, we have stadardized ad the decomposed, but it is also possible to decompose ad the to stadardize. Now, let us assume that we have a table of the stadard ormal distributio icludig positive quatiles oly. By usig a simple plot with the desity fuctio of this distributio, it is easy to see look at the areas that for the secod probability P T.6P T.6 P T <.6, so P T <.6 P T.6P T <.6 [ P T <.6] P T <.6 0.995 0.90. > porm.6,0, - porm-.6,0, [] 0.898998 Alteratively, by usig the laguage R: Coclusio: The probability of the evet is 0.90, which meas that the sample proportio of older homes with usafe wirig, calculated from the sample,...,70, will take a value betwee 0.6 ad 0. with this probability. As a percetage: the proportio of the 70 homes with usafe wirig will be betwee 6% ad % with 90% certaity. My otes: Exercise pe-p Simple radom samples,..., ad,...,6 are take from two idepedet populatios N μ, σ ad N μ, σ 0.5 Calculate or fid: The probability P S.5. 30 Solved Exercises ad Problems of Statistical Iferece

> c 0.5. The quatile c such that P 0. > 0.. 3 The probability P The quatile c such that P S S c 0.9. 0. > 0.. Advaced Item The probability P Discussio: There are two idepedet ormal populatios whose parameters are kow. The variaces, ot the stadard deviatio, are give. It is required to calculate probabilities or fid quatiles for evets ivolvig the sample meas ad the sample quasivariaces. I the first two sectios, oly oe of the populatios is ivolved. Sample sizes are ad 6, respectively. The variables ad are dimesioless, ad so are both sides of the iequalities. The evet ivolves the estimator S, which remids us of the statistic T y P S.5 P y S y σ y y.5 σ y P T or, equivaletly, μ > σ c μ σ 6.5 5.5 P T P T 5 0.99 0.5, so we thik about the statistic T The evet ivolves > c P 0.5 P S χ. The, σ μ N 0,. The, σ c μ x P T> P T> σ 0.5 0.75 P T c c Now, the quatile foud i the table of the stadard ormal distributio must verify that r 0.5l 0.750.67 c c 0.67 3 To work with the meas of two populatios, we use T 0. > 0. P > 0. P P 3 μ μ μ x μ y σ x σ y x y.0 σ σ > N 0,, so 0. μ x μ y σ x σ y x y Solved Exercises ad Problems of Statistical Iferece P T > 0. 0.5 6

0. P T >.87 P T.87 0.9979 0.00 S σ To work with the variaces of two populatios, T F, is used: S σ P T > 0.9 P S S c P σ S σ S c σ σ P T c The quatile foud i the table of the distributio F fid the ukow c: c r 0.l 0.93.30, σ σ P T c 0.5 c P T F,6 F 0,5 is 3.30, which allows us to > qf0.9, 0, 5 [] 3.970 c 6.60. Advaced Item I this case, allocatig the two sample meas i the first side of the iequality leads to 0. > 0. P > 0. P We remember that N μ, σ ad σ N μ, so the rules that gover the sums ad hece subtractios of ormally distributed variables imply both N μ μ, σ σ ad σ σ N μ μ, Note that i both cases the variaces are added ucertaity icreases. Although the differece is used more frequetly, to compare to populatios, the samplig distributio of the sum of the sample meas is also kow thaks to the rules for ormal variables; alteratively, we could still use the first result by doig ad usig the has mea ad variaces equal to μ ad σ. Either way, after stadardizig: T μ μ σ σ N 0,. Now, This is the mathematical tool ecessary to work with 0. > 0. P > 0. P P P T > μ μ σ σ > 0. μ μ σ σ P T > 0. 0.5 6 0. 3 P T > 6.7 P T 6.7 The quatile 6.7 is ot usually i the tables of the N0,, so we ca cosider that P T 6.7 0. Or, if > -porm-6.7,0, we use the programmig laguage R: [] Coclusio: For each case, we have selected the appropriate statistic. After completig the expressio of the evet, the statistic T appears. The, sice the samplig distributio of T is kow, the tables ca be used to calculate probabilities or to fid quatiles. I the latter case, the ukow c is foud after the quatile of T. 3 Solved Exercises ad Problems of Statistical Iferece

My otes: Exercise 5pe-p Suppose that you maage a bak where the amouts of daily deposits ad daily withdrawals are give by idepedet radom variables with ormal distributios. For deposits, the mea is,000 ad the stadard deviatio is,000; for withdrawals, the mea is 0,000 ad the stadard deviatio is 5,000. a For a week, calculate or bid the probability that the five withdrawals will add up to more tha 55,000. b For a particular day, calculate or bid the probability that withdrawals will exceed deposits by more tha 5,000. Imagie that you are to lauch a ew mothly product. A prospective study idicated that profits i millio dollars ca be modeled through the radom quatity Q /.35, where follows a t distributio with twety degrees of freedom. c For a particular moth, calculate or bid the probability that profits will be smaller tha 06 oe millio pouds. Based o a exercise of Busiess Statistics, Douglas Dowig ad Jeffrey Clark, Barro's. Discussio: There are several suppositios implicit i the statemet, amely: i the ormal distributio ca reasoably be used to model the two variables of iterest D ad W; ii withdrawals ad deposits are idepedet; ad iii ca reasoably be modeled by usig the t distributio. These suppositios should firstly be evaluated by usig proper statistical techiques. To solve this exercise, the rules o sums ad differeces of ormally distributed variables must be used. Idetificatio of variables ad distributios: If D ad W represet the radom variables daily sum of deposits ad daily sum of withdrawals, respectively, from the statemet we have that D N μ D,000, σ D,000 ad W N μw 0,000, σ W 5,000 a Sice the variables are measured daily, i a week we have five measuremets oe for each workig day. Traslatio ito the mathematical laguage: We are asked for the probability 5 P W W W 3W W 5 > 55,000P j W j > 55,000 Search for a kow distributio: To calculate or bid this probability, we eed to kow the distributio of the sum or, alteratively, to relate it to ay quatity whose distributio we kow. By usig the rules that gover the sums ad subtractios of ormal variables, 5 j W j N 5μ W,5 σ W Rewritig the evet: We ca easily rewrite the evet i terms of the stadardized versio of this ormal distributio: 5 P j W j >55,000P 33 5 j W j 5μ W 55,000 5 μw > 5 σ W 5 σw P Z> 55,000 50,000 5 5,000 Solved Exercises ad Problems of Statistical Iferece P Z > 0.7

Cosultig the table: Fially, it is eough to cosult the table of the stadard ormal distributio Z. O the oe had, i the table we are give values for the quatiles 0. ad 0.5, so we could roud the value 0.7 to the closest 0.5 or, more exactly, we ca bid the probability. O the other had, our table provides lower-tail probabilities, so we will cosider the complemetary of some evets. From the figure below, it is easy to deduce that P Z > 0.> P Z > 0.7> P Z > 0.5 P Z 0.> P Z > 0.7> P Z 0.5 0.6700> P Z > 0.7> 0.6736 0.3300> P Z > 0.7> 0.36 The, 0.36< P 5 j W j > 55,000 < 0.3300 Note: It is also possible to relate the total sum to the sample mea 5 P j W j >55,000P 5 W > 55,000 P W >,000 5 j j 5 ad use that σ 5 W j W j N μ W, W 5 5 μ W W σ W 5 N 0, b Traslatio ito the mathematical laguage: We are asked for the probability P W > D 5,000. Search for a kow distributio: To calculate or bid this probability, we rewrite the evet util all radom quatities are o the left side of the iequality: P W > D 5,000P W D >5,000 Now we eed to kow the distributio of W D or, alteratively, of a quatity ivolvig this differece. By agai usig the rules that gover the sums ad differeces of ormal variables, it holds that W D N μ W μ D, σw σ D N 0,000,000, 5,000,000 Rewritig the evet: We ca easily express the evet i terms of the stadardized versio of W D: P W D> 5,000P P W D μ W μ D 5,000 μ W μ D > σw σd σw σ D W D,000 5,000,000 7 03 > PZ > P Z >.093 56 03 5 06 6 06 5 06 6 06 Cosultig the table: We ca bid the probability as follows see the figure below P Z >.0900> P Z>.093> P Z >.000 P Z.0900> P Z >.093> P Z.000 0.86> P Z>.093> 0.863 0.379> PZ >.093> 0.357 The, 0.357< P W > D 5,000< 0.379 3 Solved Exercises ad Problems of Statistical Iferece

06 < 06 P <. c Traslatio ito the mathematical laguage: We are asked for P.35.35 Search for a kow distributio: We do ot kow the distributio of /.35, but we kow that t 0 Rewritig the evet: The evet ca easily be rewritte i terms of this kow distributio: P <P <.35P <.35 P <.35.35 Cosultig the table: Fially, it is eough to cosult the table of the t distributio. The quatity.35 is i our table of lower-tail probabilities, so P <.350.900 Coclusio: For a week, the probability that the five withdrawals will add up to more tha $55,000 is aroud 0.33. For a particular day, the probability that withdrawals will exceed deposits by more tha $5,000 is aroud 0.3. For a particular moth, the probability that profits will be smaller tha oe millio dollars is 0.9, that is, quite high. My otes: Exercise 6pe-p To study the mea of a populatio variable, μ E, a simple radom sample of size is cosidered. Imagie that we do ot trust the first ad the last data, so we thik about usig the statistic 3 ~ j 3 j Calculate the expectatio ad the variace of this statistic. Calculate the mea square error MSE ad its limit whe teds to ifiite. Study the cosistecy. Compare the previous error with that of the ordiary sample mea. Discussio: The statemet of this exercise is mathematical. Here we are iterested i the mea. The quatity is dimesioless. We could ot apply the defitios, ad the mea ad the variace must be writte i terms of the mea ad the variace by applyig the basic properties of these measures. Expectatio ad variace: The basic properties of the mea ad the variace are applied to do: E E μμ ~ Var Var Var σ σ ~ E E 3 3 j j Whe icreases, that is, whe the sample cosists of more ad more data, the limits are, respectively: ~ lim E lim μμ 35 ad ~ lim Var lim σ 0 Solved Exercises ad Problems of Statistical Iferece

Cosistecy: The previous limits show that ~ has some basic desirable properties: asymptotic ubiasedess ad evaescet variace. This pair is equivalet to the evaescece of the mea square error MSE, that is, the cosistecy i mea of order two a sufficiet, but ot ecessary, coditio for the cosistecy i probability. Compariso of errors: MSE ~ σ MSE σ Sice σ appears i the two positive quatities, by lookig at the coefficiets it is easy to see that, ~ MSE < MSE for larger tha. This result is due to the fact that the sample mea uses all the data available, though oly the umber of data ot their quality, sice all of them are supposed to follow the same distributio is cosidered i calculatig the mea square error. I the limit, is egligible. We ca plot the coefficiets they are also the mea square errors whe σ. # Grid of values for '' seqfrom3,to0,by # The three sequeces of coefficiets coeff /- coeff / # The plot allvalues ccoeff, coeff ylim cmiallvalues, maxallvalues; x; parmfcolc,3 plot, coeff, xlimcmi,max, ylim ylim, xlab' ', ylab' ', mai'coefficiets ', type'l' plot, coeff, xlimcmi,max, ylim ylim, xlab' ', ylab' ', mai'coefficiets ', type'b' plot, coeff, xlimcmi,max, ylim ylim, xlab' ', ylab' ', mai'all coefficiets', type'l' poits, coeff, type'b' This code geerates the followig array of figures: Asymptotically, both estimators behave similarly, sice. Coclusio: ~ is a cosistet estimator of μ. The estimator is appropriate for estimatig μ. Whe othig suggests removig data, it is better to maitai them i the sample. Advaced theory: The estimator i the statemet is the usual sample mea whe the sample has data istead of leavig out these two data ca be see as a sort of data treatmet implemeted i the method, ot i the previous aalysis of data. Whe ay of the two left out data is ot trustable, usig this estimator makes sese; otherwise, it does ot exploit the iformatio available efficietly. O the other had, the sample mea ca be affected by tiy or huge values outliers. To make the sample mea robust, this estimator is sometimes cosidered after orderig the data from the smallest to the largest; if j is the j-th datum i the sample already reordered: ~ 3 j j This ew robust estimator of the populatio mea μ is called trimmed sample mea, ad ay umber of data ca be left out ot oly two. 36 Solved Exercises ad Problems of Statistical Iferece

My otes: Exercise 7pe-p A populatio variable follows the χ distributio with κ degrees of freedom. We cosider a statistic T that uses the iformatio cotaied i the simple radom sample,,...,. If T T,,...,, calculate its expectatio ad variace. Calculate the mea square error of T. As a estimator of twice the mea of the populatio law, is T a cosistet estimator? Hit: If follows the χ distributio with κ degrees of freedom, μ E κ ad σ Var κ. Discussio: Eve if a populatio is metioed, this statemet is mathematical. To calculate the value of these two properties of the samplig distributio of T, we have to apply the geeral properties of the expectatio ad the variace. The kowledge about the distributio of will be used i the last steps. This is a dimesioless quatity. The mea square error is defied i terms of these quatities. Expectatio or mea: [ E T E ] E E E j j j j E j E κ j j j Sice μ E κ Variace: [ Var T Var ] j Var j j Var j 8κ Var j j Sice σ Var κ Var j j Idepedece of j simple radom sample Mea square error: Sice b T E T E κ κ, the MSE T bt Var T 8κ d Cosistecy: Although the variace of T teds to zero whe icreases, the bias does ot thus, T is asymptotically biased. Hece, the mea square error does ot ted either, ad othig ca be said about the cosistecy i probability usig this way although we ca say that it is ot cosistet i mea of order two. Coclusio: Sice the mea square error teds to, i geeral T is ot a good estimator of μ eve for may data. My otes: 37 Solved Exercises ad Problems of Statistical Iferece

Exercise 8pe-p Give a simple radom sample of size, that is,,, the followig estimators of μ E are defied: μ μ 3 3 Calculate their mea square error. Calculate the relative efficiecy. Which oe would you use to estimate μ? Based o a exercise of Statistics for Busiess ad Ecoomics. Newbold, P., W. Carlso ad B. Thore. Pearso-Pretice Hall. Discussio: This statemet is basically mathematical. The relative efficiecy is defied i terms of the mea square error of the estimators. Meas: By applyig the basic properties of the expectatio or mea, E E E E μ μμ E μ E E E E E μ μμ 3 3 3 3 3 3 3 3 E μ E Variaces: By applyig the basic properties of the variace, Var Var Var Var σ σ σ 5 Var μ Var Var Var Var Var σ σ σ 3 3 3 3 9 9 9 9 9 Var μ Var Mea square errors: MSE μ b μ Var μ [ E μ μ ] Var μ [μ μ] σ σ 5 5 MSE μ b μ Var μ [ E μ μ ] Var μ [μ μ ] σ σ 9 9 Relative efficiecy: Sice bias is zero for ubiased estimators, the mea square error is equal to the variace ad we will prefer the estimator with the smallest variace. A easy way of comparig two estimators cosists i usig the cocept of relative efficiecy, which is a simple quotiet take ito accout which estimator you allocate i the umerator. Whe this quotiet is over oe, the estimator i the deomiator has smaller mea square error, ad vice versa. I this case, 5σ MSE μ 9 0 e μ, μ > MSE μ σ 9 μ is preferred for estimatig μ. Coclusio: Both estimators are ubiased while the first has smaller variace; the, the first is preferred. We have ot mathematically proved that this first estimator miimizes the variace, so we caot say that it is a efficet estimator. 38 Solved Exercises ad Problems of Statistical Iferece

My otes: Exercise 9pe-p The mea μ E of ay populatio ca be estimated from a simple radom sample of size through. Prove that: a This estimator is always cosistet. b For ormally distributed ormal populatio, this estimator is efficiet. Discussio: This statemet is theoretical. The first sectio of this exercise eeds calculatios similar to those of previous exercises. To prove the efficiecy, we have to apply its defiitio. a Cosistecy: The expectatio of the sample mea is always for ay populatio the populatio mea. Nevertheless, we repeat the calculatios: E E j E j j j j E j E E μ The variace of the sample mea is always for ay populatio the populatio variace divided by. We repeat the calculatios too: Var Var Var j j Idepedece of j simple radom sample j j σ Var Var j j E μ 0. We prove the cosistecy i probability by usig the The bias is defied as b sufficiet but ot ecessary coditio cosistecy i mea of order two: ] lim 0 σ 0 lim MSE lim [ b Var The, it is cosistet i mea of order two ad therefore i probability. [ ] b Efficiecy: It is ecessary to prove that the two coditios of the defiitio are fulfilled: i. The expectatio of is always μ E, that is, is always a ubiased estimator of μ. ii. has miimum variace, which happes because of a theoretical result whe Var attais the Cramér-Rao's lower boud E [ log[ f ; θ] θ ] where θ μ i this case, ad fx;θ is the probability fuctio of the populatio law where the oradom variable x is substituted by the radom variable otherwise, it is ot possible to talk about expectatio, sice fx;θ is ot radom whe θ is a parameter. The ubiasedess is proved. O the other had, we compute the Cramér-Rao's lower boud step by step: Fuctio with i place of x f ;μ e π σ 39 μ σ Solved Exercises ad Problems of Statistical Iferece

Logarithm of the fuctio: log[ f ;μ ]log loge π σ μ σ log π σ μ σ 3 Partial derivative of the logarithm of the fuctio: log[ f ;μ] 0 μ μ μ σ σ Expectatio of the squared partial derivative of the logarithm of the fuctio: I this step, we must rewrite the terms so as to make σ Var E E E μ appear. E [ log[ f ; μ] μ ] E[ μ σ 5 Cramér-Rao's lower boud: ] E [ μ ] Var σ σ σ σ σ E [ log[ f ;μ ] μ ] σ σ The variace of the estimator, calculated i sectio a, attais the boud ad hece the estimator has miimum variace. Sice both coditios are fulfilled, the efficiet is proved. Coclusio: We have proved that the sample mea is always for ay populatio a cosistet estimator of the populatio mea μ. For a ormal populatio, it is also efficiet. Advaced theory: Whe log[fx;θ] is twice differetiable with respect to θ, the Cramér-Rao's boud ca equivaletly be writte as E [ log[ f ; θ] θ ] Cocerig the regularity coditios, Wikipedia refers http://e.wikipedia.org/wiki/fisher_iformatio to eq..5.6. of Theory of Poit Estimatio, Lehma, E. L. ad G. Casella, 998. Spriger. Let us assume that this alterative expressio ca be applied; the, step 3 would be log[ f ; μ] μ μ σ μ σ σ step would be [ ] [ ] log[ f ;μ ] E E μ σ σ ad, fially, step 5 would be E [ log [ f ; μ] μ ] σ σ We would have obtaied the same result with easier calculatios, although the fulfillmet of the regularity coditios must have bee verified previously. My otes: 0 Solved Exercises ad Problems of Statistical Iferece

Exercise 0pe-p Let θ be the parameter of a populatio radom variable that follows a cotiuous uiform distributio o the iterval [θ, θ], ad let,..., be a simple radom sample; the, a Plot the desity fuctio of the variable. b Study the cosistecy of the sample mea whe it is used to estimate the parameter θ. c Study the efficiecy of the sample mea whe it is used to estimate the parameter θ. d Fid a ubiased estimator of θ ad study its cosistecy. Hit: Use that E θ / ad Var 3/. Discussio: This statemet is mathematical. We should kow the desity fuctio of the cotiuous uiform distributio, although it could also be deduced from the fact that all possible values have the same probability. The quatity is dimesioless. a Desity fuctio: For this distributio, all values have the same probability, so the desity fuctio must be a flat curve. For the case θ > there is a similar figure for ay other θ, This plot is ot ecessary for the followig sectios. b Study the cosistecy i probability of as a estimator of θ We apply the sufficiet cosistecy i mea of order two: lim MSE θ0 { lim b θ 0 lim Var θ 0 b Bias: By applyig a property of the sample mea ad the iformatio of the statemet, E θ E E θθ θ b lim b lim It is asymptotically biased. Sice oe coditio of the pair is ot verified, it is ot ecessary to check the other, ad either the fulfillmet of the cosistecy i probability or the opposite ca be proved usig this way though the estimator is ot cosistet i the mea-square sese. c Study the efficiecy of as a estimator of θ The defiitio of efficiecy cosists of two coditios: ubiasedess ad miimum variace this latter is checked by comparig the variace ad the Cramér-Rao's boud. c Ubiasedess: I the previous sectio it has bee proved that is a biased estimator of θ. The first coditio does ot hold, ad hece it is ot ecessary to check the secod oe. The coclusio is that is ot a efficiet estimator of θ. Solved Exercises ad Problems of Statistical Iferece

d A ubiased estimator of θ ad its cosistecy, which suggests correctig the previous estimator by addig /, that is: I b we foud that b. To study its cosistecy i probability, we apply the sufficiet coditio metioed i sectio θ b the cosistecy i mea of order two. d Bias: By applyig a property of the sample mea ad the iformatio of the statemet, E θ θ b θe E θ θ θθ θ0 lim b θlim 0 0 d Variace: By applyig a property of the sample mea ad the iformatio of the statemet, Var 3 ^ Var θvar Var 3 0 lim Var θ lim is a As a coclusio, the mea square error MSE teds to zero ad hece the proposed estimator θ cosistet i mea square error ad hece i probability estimator of θ. Coclusio: We could prove either the cosistecy or the efficiecy. Nevertheless, the bias has allowed us to build a ubiased, cosistet estimator of the parameter. The efficiecy of this ew estimator could be studied, but it is ot required i the statemet. My otes: Exercise pe-p A populatio radom quatity is supposed to follow a geometric distributio. Let,..., be a simple radom sample. By applyig the factorizatio theorem below, fid a sufficiet statistic T T,..., for the parameter. Give explaatios. Discussio: The factorizatio theorem ca be applied both to prove that a give statistic is sufficiet ad to fid sufficiet statistics. O the other had, for the distributio ivolved we kow that Likelihood fuctio: L ; η j f j ; η f ; η f ; η f ; ηη η η η η η Solved Exercises ad Problems of Statistical Iferece

η η η η j j Theorem: We must try allocatig each term of the likelihood fuctio: η depeds oly o the parameter, ot o j. The, it would be part of g. j depeds o both the parameter ad the data j, ad these two kids of iformatio η either are mixed or ca mathematically be separated. The, it would be part of g ad the oly j possible sufficiet statistic, if the theorem holds, is T j j. j j By cosiderig g T ; ηη η η ad h, the theorem holds ad hece the statistic T j j is sufficiet for studyig η. The idea behid this kid of statistics is that they summarize the importat iformatio about the parameter cotaied i the sample. I fact, the statistic T has essetially the same iformatio as ay oe-to-oe trasformatio of it, particularly the sample mea T j j. Coclusio: The factorizatio theorem has bee used to fid a sufficiet statistic for the parameter. Sice the total sum appears, we complete the expressio to write the result i terms of the sample mea. Both statistics cotai the same iformatio about the parameter of the distributio. My otes: Exercise pe-p * For populatio variables ad, simple radom samples of size ad are take. Calculate the mea square error of the followig estimators, possibly by usig proper statistics ivolvig them whose samplig distributio is kow. A For ay populatios: B For Beroulli populatios: η^ η^ η^ C For ormal populatios: V V V s s s S S S Suppose that the two populatios are idepedet. Study the cosistecy i mea of order two ad the the cosistecy i probability. Discussio: I this exercise, the most importat estimators are ivolved. The basic properties of the expectatio ad the variace allows us to calculate the mea square error. I most cases, the estimators will be completed for a proper quatity with kow samplig distributio to appear, ad the use its properties. Although the estimators of the third sectio ca be used for ay ad, the calculatios for ormally distributed variables are easier due to the use of additioal iformatio the kowledge about statistics ad their samplig distributio. Thus, the results of this sectio are based o the ormality of the variables ad. Some of the quatities are also valid for ay variables. 3 Solved Exercises ad Problems of Statistical Iferece

The mea square errors are foud for static situatios, but the idea of limit ivolves dyamic situatios. Statistically speakig, we wat to study the behaviour of the estimators whe the umber of data icreases we ca imagie a sequece of schemes where more ad more data are added to the samples, that is, with the sample sizes always icreasig. From the mathematical poit of view, limits must be studied for ay possible way i which the sample sizes ted to ifiite. Fortuately, the limits of the two-variable fuctios sequeces, really that appear i this exercise ca easily be solved either by decomposig them ito two limits of oe-variable fuctios or by bidig the twovariable sequeces. That the limits are studied whe ad ted to ifiite facilitates the calculatios e.g. a costat like is egligible whe it appears i a factor. A For ay populatios a For the sample mea It holds that E E E j E E μ j j j Var Var j Var σ j j j MSE [ E μ ] Var 0 σ σ Var Var The, The estimator is ubiased for μ, whatever the sample size. The estimator is cosistet i mea of order two ad therefore i probability for μ, sice lim MSE lim σ 0 It is sufficiet ad ecessary the sample size tedig to ifiite see the mathematical appedix. a For the differece betwee the sample meas By usig the previous results, E E E μ μ Var Var Var σ σ MSE [ E μ μ ] Var σ σ The mea square error of is the sum of the mea square errors of ad. O the other had, The estimator is ubiased for μ μ, whatever the sample sizes. The estimator is cosistet i the mea-square sese ad hece i probability for μ μ, as Solved Exercises ad Problems of Statistical Iferece

lim MSE lim σ σ 0 It is sufficiet ad ecessary the two sample sizes tedig to ifiite see the mathematical appedix. B For Beroulli populatios b For the sample proportio η^ Sice η^ is a particular case of the sample mea, ^ μη E η Var η^ σ η η ^ [ E η η ^ ^ MSE η ] Var η η η The, The estimator η^ is ubiased whatever the sample size. It is cosistet for η, beig sufficiet ad ecessary the sample size tedig to ifiite. b For the differece betwee the sample proportio η^ η^ Agai, this is a particular case of differece betwee sample meas, ^ η^ μ μ η η E η ^ η ^ Var η σ σ η η η η ^ MSE η^ η σ σ η η η η The, The estimator η^ η^ is ubiased for η η, whatever the sample sizes. It is also cosistet for η η, beig sufficiet ad ecessary the two sample sizes tedig to ifiite. C For ormal populatios c For the variace of the sample V By usig T V χ ad the properties of the chi-square distributio, σ V E V E σ σ V σ E σ σ σ V σ Var V σ σ Var V Var σ σ σ MSE V [ E V σ ] Var V σ The, The estimator V is ubiased for σ, whatever the sample size. 5 Solved Exercises ad Problems of Statistical Iferece

The estimator V is cosistet i mea of order two ad therefore i probability for σ, sice lim MSE V lim σ 0 It is sufficiet ad ecessary the sample size tedig to ifiite see the mathematical appedix. I aother exercise, this estimator is compared with the other two estimators of the variace. For the expectatio, it is easy to fid i literature direct calculatios that lead to the same value for ay variables ot ecessarily ormal. V c For the quotiet betwee the variaces of the samples V V σ By usig T F, ad the properties of the F distributio, V σ E V σ V σ σ V σ σ σ σ σ [ ] [ E V Var MSE V V V V σ V σ Var E σ σ V σ V [ σ V σ σ σ V σ V E Var σ V Var σ V σ V σ σ σ σ σ ] ] σ σ σ σ > σ σ V > > The, The estimator is V /V biased for σ/σ, but it is asymptotically ubiased sice lim E V V lim σ σ σ lim σ σ σ Mathematically, oly must ted to ifiite. Statistically, sice populatios ca be amed ad allocated i either order, it is deduced that both sample sizes must ted to ifiite. I fact, it is sufficiet ad ecessary the two sample sizes tedig to ifiite see the mathematical appedix. The estimator V /V is cosistet i mea of order two ad therefore i probability for σ /σ, sice it is asymptotically ubiased ad lim Var V V σ σ lim σ lim σ 3 3 0 σ lim σ 0 The umerator teds to zero if ad oly if so do both sample sizes. I short, it is sufficiet ad ecessary the two sample sizes tedig to ifiite this limit has bee studied i the mathematical appedix. 6 Solved Exercises ad Problems of Statistical Iferece

I aother exercise, this estimator is compared with the other two estimators of the quotiet of variaces. c3 For the sample variace s By usig T s χ ad the properties of the chi-square distributio, σ s σ E s σ σ E s E σ σ σ s s Var s Var σ σ Var σ σ σ σ MSE s [E s σ ] Var s [ ] σ σ σ σ The, The estimator s is biased but asymptotically ubiased for σ, sice lim E s lim σ σ lim σ It is sufficiet ad ecessary the sample size tedig to ifiite see the mathematical appedix. The estimator s is cosistet i mea of order two ad therefore i probability for σ, sice lim MSE s lim [ ] σ 0 It is sufficiet ad ecessary the sample size tedig to ifiite see the mathematical appedix. I aother exercise, this estimator is compared with the other two estimators of the variace. For the expectatio, it is easy to fid i literature direct calculatios that lead to the same value for ay variables ot ecessarily ormal. s c For the quotiet betwee the sample variaces s By usig T E s s s s σ F s σ, σ σ σ 3 σ σ σ Var ad the properties of the F distributio, σ s σ E σ s σ s S σ Var S σ s σ s σ > σ σ σ 3 5 σ 7 Solved Exercises ad Problems of Statistical Iferece >

MSE [ {[ s s s E s σ σ ] [ s σ σ Var 3 σ σ s ] } σ 3 3 5 σ ] σ 3 5 σ > The, The estimator is s / s biased for σ/σ, but it is asymptotically ubiased sice s σ σ σ σ lim E lim lim lim 3 σ 3 σ 3 σ s σ It is sufficiet ad ecessary the two sample sizes tedig to ifiite see the mathematical appedix. [ ] The estimator is s / s cosistet i mea of order two ad therefore i probability for σ /σ, as it is asymptotically ubiased ad lim Var s s lim σ lim σ σ lim σ [ σ 3 5 ] 3 3 3 5 σ 3 5 0 It is sufficiet ad ecessary the two sample sizes tedig to ifiite see the mathematical appedix. I aother exercise, this estimator is compared with the other two estimators of the quotiet of variaces. c5 For the sample quasivariace S By usig T E S E S χ ad the properties of the chi-square distributio, σ σ S σ Var S Var S σ E σ σ σ σ S σ Var S σ σ σ σ MSE S [ E S σ ] Var S σ The, The estimator S is ubiased for σ, whatever the sample size. The estimator S is cosistet i mea of order two ad therefore i probability for σ, sice lim MSE s lim σ 0 It is sufficiet ad ecessary the sample size tedig to ifiite see the mathematical appedix. 8 Solved Exercises ad Problems of Statistical Iferece

I aother exercise, this estimator is compared with the other two estimators of the variace. For the expectatio, it is easy to fid i literature direct calculatios that lead to the same value for ay variables ot ecessarily ormal. c6 For the quotiet betwee the sample quasivariaces S σ By usig T F S σ E, S Var S σ σ E σ S σ σ 3 σ σ S S σ σ S σ Var σ S σ σ MSE σ [ [ S > 3 5 σ σ S S E S > σ S ad the properties of the F distributio, S S σ ] [ ] S σ σ σ Var 3 σ σ S 3 5 σ ] σ 3 3 5 σ > The, The estimator is S /S biased for σ/σ, but it is asymptotically ubiased sice S σ σ σ lim E lim lim 3 σ 3 σ S σ Mathematically, oly must ted to ifiite. Statistically, sice populatios ca be amed ad allocated i either order, it is deduced that both sample sizes must ted to ifiite. I fact, it is sufficiet ad ecessary the two sample sizes tedig to ifiite see the mathematical appedix. The estimator is S /S cosistet i mea of order two ad therefore i probability for σ /σ, as it is asymptotically ubiased ad lim Var σ σ S S lim lim [ σ 3 5 σ 3 3 3 5 σ σ ] lim 0 3 5 It is sufficiet ad ecessary the two sample sizes tedig to ifiite see the mathematical appedix. I aother exercise, this estimator is compared with the other two estimators of the quotiet of variaces. Coclusio: For the most importat estimators, the mea square error has bee calculated either directly i few cases or by makig a proper statistic appear. The cosistecies i mea square error of order two ad i 9 Solved Exercises ad Problems of Statistical Iferece

probability have bee proved. Some limits for fuctios of two variables arised. These kids of limit are ot trivial i geeral, as there is a ifiite amout of ways for the sizes to ted to ifiite. Nevertheless, those appearig here could be calculated directly of after doig some simple algebra trasformatio multiplyig ad dividig by the proper quatity, as they were limits of sequeces of the idetermied form ifiite-over-ifiite. O the other had, it is worth oticig that there are i geeral several matters to be cosidered i selectig amog differet estimators of the same quatity: a The error ca be measured by usig a quatity differet to the mea square error. b For large sample sizes, the differeces provided by the formulas above may be egligible. c The computatioal or maual effort i calculatig the quatities must also be take ito accout ot all of them requires the same umber of operatios. d We may have some quatities already available. My otes: Exercise 3pe-p * I the followig situatios, compare the mea square error of the followig estimators whe simple radom samples, take from ormal populatios, are cosidered: A V B V V s s s S S S Cosider oly the case I the secod sectio, suppose that the populatios are idepedet. Discussio: The expressios of the mea square error of these estimators have bee calculated i other exercise. Comparig the coefficiets is easy i some cases, but sequeces may sometimes cross oe aother ad the comparisos must be doe aalitically by solvig equalities ad iequalities or graphically. We plot the sequeces lies betwee dots are used to facilitate the idetificatio. The mea square errors were foud for static situatios, but the idea of limit ivolves dyamic situatios. By usig a computer, it is also possible to study either aalytically or graphically the asymptotic behaviour of the estimators but it is ot a whole mathematical proof. It is worth oticig that the formulas ad results of this exercise are valid for ormal populatios because of the theoretical results o which they are based; i the geeral case, the expressios for the mea square error of these estimators are more complex. For two populatios, there is a ifiite amout of mathematical ways for the two sample sizes to ted to ifiite see the figure; the case, i the last figure, will be cosidered. 50 Solved Exercises ad Problems of Statistical Iferece

A For V, s ad S The expressios of their mea square error are: MSE s σ σ MSE S σ Sice σ appears i all these positive quatities, by lookig at the coefficiets it is easy to see that, for is larger tha two, MSE s < MSE V < MSE S MSE V That is, sequeces idexed by do ot cross oe aother. We ca plot the coefficiets they are also the mea square errors whe σ. # Grid of values for '' seqfrom,to0,by # The three sequeces of coefficiets coeff / coeff / - /^ coeff3 /- # The plot allvalues ccoeff, coeff, coeff3 ylim cmiallvalues, maxallvalues; x; parmfcolc, plot, coeff, xlimcmi,max, ylim plot, coeff, xlimcmi,max, ylim plot, coeff3, xlimcmi,max, ylim plot, coeff, xlimcmi,max, ylim poits, coeff, type'b' poits, coeff3, type'b' ylim, ylim, ylim, ylim, xlab' xlab' xlab' xlab' ', ', ', ', ylab' ylab' ylab' ylab' ', ', ', ', mai'coefficiets ', type'l' mai'coefficiets ', type'b' mai'coefficiets 3', type'b' mai'all coefficiets', type'l' This code geerates the followig array of figures: Asymptotically, the three estimators behave similarly, sice. V s S B For, ad V s S The expressios of their mea square error, whe, are: MSE V σ σ σ σ {[ {[ {[ V ] } {[ ] } {[ ] } {[ ] } > } s σ σ MSE 3 3 s 3 5 σ 3 5 σ MSE S S ] σ σ 3 3 3 5 σ 3 5 σ ] } > > For equal sample sizes, the mea square error of the last two estimators is the same but they may behave differetly uder other criteria differet to the mea square error, e.g. eve their expectatio. We ca plot the coefficiets they are also the mea square errors whe σ σ, for > 5. 5 Solved Exercises ad Problems of Statistical Iferece

# Grid of values for '' seqfrom6,to5,by # The three sequeces of coefficiets coeff /--^ **-/-^*- coeff -/-3-^ *-*-/-3^*-5 coeff3 coeff # The plot allvalues ccoeff, coeff, coeff3 ylim cmiallvalues, maxallvalues; x; parmfcolc, plot, coeff, xlimcmi,max, ylim ylim, xlab' ', ylab' plot, coeff, xlimcmi,max, ylim ylim, xlab' ', ylab' plot, coeff3, xlimcmi,max, ylim ylim, xlab' ', ylab' plot, coeff, xlimcmi,max, ylim ylim, xlab' ', ylab' poits, coeff, type'b' poits, coeff3, type'b' ', ', ', ', mai'coefficiets ', type'l' mai'coefficiets ', type'b' mai'coefficiets 3', type'b' mai'all coefficiets', type'l' This code geerates the followig array of figures: This shows that, for ormal populatios ad samples of sizes, it seems that MSE V V? MSE s s MSE S S ad the sequeces do ot cross oe aother. Really, a figure is ot a mathematical proof, so we do the followig calculatios:? 3 3 5? 5 3 5? 3 3 5? 3 3 5? 3 5? 3 6 30 3 5 8? 3 This iequality is true for, sice it is true for ad the secod side icreases with. Thus, we ca guaratee that, for > 5, V s S MSE MSE MSE V s S Asymptotically, by usig ifiites V lim MSE lim V lim 5 {[ {[ ] } σ σ ] } [ ] σ σ lim 0 σ σ Solved Exercises ad Problems of Statistical Iferece

lim MSE {[ s s ] lim lim MSE } σ 3 3 5 σ S S {[ lim lim } σ lim σ ] {[ {[ > [ ] σ 0 σ } ] σ 3 3 5 σ ] } [ ] σ σ lim 0 σ σ The three estimators behave similarly, sice the quatitative behaviour of their mea square errors is characterized by the same limit, amely: σ lim 0. σ [ ] It is worth oticig that this asymptotic behaviour arises whe the limits are solved by usig ifiites this caot see whe the limits are solved by usig other ways. Coclusio: The expressio of the mea square error of these estimators allow us to compare the, to study their cosistecy ad eve their rate of covergece. We have proved the followig result: Propositio For a ormal populatio, MSE s < MSE V < MSE S For two idepedet ormal populatios, whe MSE V V MSE s s MSE S S Note: For oe populatio, V has higher error tha s, eve if the iformatio about the value of the populatio mea μ is used by the former while it is estimated i the other two estimators. For two populatios, the iformatio about the value of the two populatio meas μ ad μ is used i the first quotiet while they must be estimated i the other two estimators. Either way, the populatio mea i itself does ot play a importat role i studyig the variace, which is based o relative distaces, but ay estimatio usig the same data reduces the amout of iformatio available ad the degrees of freedom i a uit. Agai, it is worth oticig that there are i geeral several matters to be cosidered i selectig amog differet estimators of the same quatity: a The error ca be measured by usig a quatity differet to the mea square error. b For large sample sizes, the differeces provided by the formulas above may be egligible. c The computatioal or maual effort i calculatig the quatities must also be take ito accout ot all of them requires the same umber of operatios. d We may have some quatities already available. My otes: 53 Solved Exercises ad Problems of Statistical Iferece

Exercise pe-p * For populatio variables ad, simple radom samples of size ad are take. Calculate the mea square error of the followig estimators use results of previous exercises. ^ η^ η A For two idepedet Beroulli populatios: B For two idepedet ormal populatios: V V s s where η η V V η p V p η^ p S S Vp s s s p sp Sp S S S p Similarly for. Try to compare the mea square errors. Study the cosistecy i mea of order two ad the the cosistecy i probability. Discussio: The expressios of the mea square error of the basic estimators ivolved i this exercise has bee calculated i aother exercise, ad they will be used i calculatig the mea square errors of the ew estimators. The errors are calculated for static situatios, but limits are studied i dyamic situatios Comparig the coefficiets is easy i some cases, but sequeces ca sometimes cross oe aother ad the comparisos must be doe aalitically by solvig equalities ad iequalities or graphically. By usig a computer, it is also possible to study either aalytically or graphically the behaviour of the estimators. The results obtaied here are valid for two idepedet Beroulli populatios ad two idepedet ormal populatios, respectively. O the other had, we must fid the expressio of the error for the ew estimators based o semisums: [ ] MSE θ^ θ^ E θ^ θ^ θ Var θ^ θ^ ad, for ubiased estimators, MSE θ^ θ^ 0 [Var θ^var θ^ ] A For Beroulli populatios: ^ η^ η a For the semisum of the sample proportios ad η^ p ^ η^ η By usig previous results ad that μη ad ση η, E η^ η^ [ E η^ E η^ ] η η η 5 Solved Exercises ad Problems of Statistical Iferece

η η η^ η^ [ Var η^ Var η^ ] η η η η η η η η MSE η^ η^ [ η η μ ] η η Var The, ^ is ubiased for μ, whatever the sample sizes. η^ η ^ is cosistet i the mea-square sese ad therefore i probability for η. The estimator η^ η lim MSE η^ η^ lim η η The estimator [ ] It is sufficiet ad ecessary that both sample sizes must ted to ifiite see the mathematical appedix. a For the pooled sample proportio η^ p Firstly, we write η^ p ^. Now, by usig previous results, η^ η E η ^ p η η [ E η ^ E η ^ ] η Var η ^ p η η η η [ Var η^ Var η ^ ] η η MSE η^ p η η η η η η η η η The, The estimator η^ p is ubiased for η, whatever the sample sizes. The estimator η^ p is cosistet i mea of order two ad therefore i probability for η, sice η η lim MSE η^ p lim 0 If the mea square error is compared with those of the two populatios, we ca see that the ew deomiator is the sum of both sample sizes. Agai, it is worth oticig that it is sufficiet ad ecessary at least oe sample size tedig to ifiite, but ot both. I this case, the deomiator teds to ifiite. The iterpretatio of this fact is that, i estimatig, oe sample ca do the whole work. a3 Compariso of ^ ad η^ p η^ η Case MSE ^ MSE η η^ η^ η η p ^ η^ i this case. I fact, by lookig at the expressios of the estimators themselves, η^ p η Geeral case The expressios of their mea square error are the sample proportio is ubiased: 55 Solved Exercises ad Problems of Statistical Iferece

MSE ^ η^ η η η MSE η^ p η η The 0 The, the pooled estimator is always better or equal tha the semisum of the sample proportios. Both estimators have the same mea square error their behaviour may be differet uder other criteria differet to the mea square error oly whe. Besides, Thus, ca be see as a measure of the coveiece of usig the pooled sample proportio, sice it shows how differet the two errors are. The iequality also shows a symmetric situatio, i the sese that it does ot matter which sample size is bigger: the measure depeds o the differece. We have proved the followig result: Propositio For two idepedet Beroulli populatios with the same parameter, the pooled sample proportio has smaller or equal mea square error tha the semisum of the sample proportios. Besides, both are equivalet oly whe the sample sizes are equal. We ca plot the coefficiets they are also the mea square errors whe η η for a sequece of sample sizes, idexed by k, such that kk, for example but this oly oe possible way for the sample sizes to ted to ifiite: # Grid of values for '' c seqfrom,to0,by # The sequeces of coefficiets coeff /c/* coeff /c* # The plot allvalues ccoeff, coeff ylim cmiallvalues, maxallvalues; x; parmfcolc,3 plot, coeff, xlimcmi,max, ylim ylim, xlab' ', ylab' ', mai'coefficiets ', type'l' plot, coeff, xlimcmi,max, ylim ylim, xlab' ', ylab' ', mai'coefficiets ', type'b' plot, coeff, xlimcmi,max, ylim ylim, xlab' ', ylab' ', mai'all coefficiets', type'l' poits, coeff, type'b' This code geerates the followig array of figures: The reader ca repeat this figure by usig values closer to ad farther from tha ck. B For ormal populatios b For the semisum of the variace of the samples V V By usig previous results, 56 Solved Exercises ad Problems of Statistical Iferece

V V [ E V E V ] σ σ σ σ σ Var V V [ Var V Var V ] σ E MSE The, σ σ V V σ σ σ σ ] [ V V is ubiased for σ, whatever the sample sizes. The estimator V V is cosistet i the mea-square sese ad therefore i probability for σ sice, lim MSE η^ p lim σ 0 The estimator It is sufficiet ad ecessary that both sample sizes must ted to ifiite see the mathematical appedix. s s b For the semisum of the sample variaces By usig previous results, E s s E s E s σ σ σ Var [ ] ] [ [ ] s s Var s Var s σ σ σ [ ] [ [ ] [ MSE s s σ σ σ σ σ ] [ ] ] σ σ σ {[ [ ] [ σ σ [ The estimator ] [ ]} [ σ σ The, ] σ ] [ ] σ ] s s is biased but asymptotically ubiased for σ, sice lim E s s σ lim σ σ 57 Solved Exercises ad Problems of Statistical Iferece

It is sufficiet ad ecessary the two sample sizes tedig to ifiite see the mathematical appedix. The estimator s s is cosistet i the mea-square sese ad therefore i probability for σ, because it is asymptotically ubiased ad lim Var s s σ lim 0 Agai, it is sufficiet ad ecessary the two sample sizes tedig to ifiite see the mathematical appedix. S S b3 For the semisum of the sample quasivariaces By usig previous results, S S [ E S E S ] σ σ σ σ σ Var S S [ Var S Var S ] σ E MSE The, σ σ S S σ σ σ σ ] [ S S is ubiased for σ, whatever the sample sizes. The estimator S S is cosistet i the mea-square sese ad therefore i probability for σ sice, lim MSE S S lim σ 0 The estimator It is sufficiet ad ecessary both sample sizes tedig to ifiite see the mathematical appedix. b For the pooled variace of the samples V p We ca write V p V V V V. By usig previous results, E V p EV EV σ σ σ Var V Var V σ σ Var V σ p σ σ σ σ MSE V σ σ y y y p The, The estimator V p is ubiased for σ, whatever the sample sizes. The estimator V p is cosistet i mea of order two ad therefore i probability for σ, sice 58 Solved Exercises ad Problems of Statistical Iferece

lim MSE V p σ lim 0 It is worth oticig that it is sufficiet ad ecessary at least oe sample size tedig to ifiite, but ot both. I this case, the deomiator teds to ifiite. The iterpretatio of this fact is that, i estimatig, oe sample ca do the whole work. b5 For the pooled sample variace s p s s s s. By usig previous results, We ca write s p E s p E s E s σ σ σ Var s Var s σ σ Var s σ p [ ] MSEs σ σ σ σ σ p The, The estimator s p is biased for σ, but asymptotically ubiased lim σ lim σ σ The calculatio above for the mea suggests that a i the deomiator of the defiitio would provide a ubiased estimator see the estimator i the followig sectio. The estimatoris s p cosistet i mea of order two ad therefore i probability for σ, sice lim MSE s p σ lim 0 It is worth oticig that it is sufficiet ad ecessary at least oe sample size tedig to ifiite, but ot both. I this case, the deomiator teds to ifiite. The iterpretatio of this fact is that, i estimatig, oe sample ca do the whole work. b6 For the bias-corrected pooled sample variace S p We ca write S p S S [ S S ]. By usig previous results, E S p E S E S σ σ σ Var S Var S σ σ Var S σ p [ ] σ σ σ σ MSE S σ σ p The, The estimator S p is ubiased for σ, whatever the sample sizes. 59 Solved Exercises ad Problems of Statistical Iferece

The estimator S p is cosistet i mea of order two ad therefore i probability for σ, sice lim MSE S p lim σ 0 It is worth oticig that it is sufficiet ad ecessary at least oe sample size tedig to ifiite, but ot both. I this case, the deomiator teds to ifiite. The iterpretatio of this fact is that, i estimatig, oe sample ca do the whole work. b7 Compariso of V V, s s, S S, V p, s p ad S p Case V V σ σ MSE s s 0 σ σ MSE S S σ σ MSE σ σ MSEs p σ σ MSES p σ σ MSE V p Sice σ appears i all these positive quatities, by lookig at the coefficiets it is easy to see the relatio MSE s s MSE V V MSE V p MSE s p < MSE S p MSE S S For idividual estimators, the order MSE s < MSE V < MSE S was obtaied i other exercise. This relatio has bee obtaied for the case ad idepedet ormal populatios. We ca plot the coefficiets they are also the mea square errors whe σ. # Grid of values for '' seqfrom0,to0,by # The three sequeces of coefficiets coeff / coeff coeff coeff3 /- coeff coeff coeff5 coeff coeff6 coeff3 # The plot allvalues ccoeff, coeff, coeff3, coeff, ylim cmiallvalues, maxallvalues; x; parmfcolc,7 plot, coeff, xlimcmi,max, ylim plot, coeff, xlimcmi,max, ylim plot, coeff3, xlimcmi,max, ylim plot, coeff, xlimcmi,max, ylim plot, coeff5, xlimcmi,max, ylim plot, coeff6, xlimcmi,max, ylim plot, coeff, xlimcmi,max, ylim poits, coeff, type'l' poits, coeff3, type'b' poits, coeff, type'l' poits, coeff5, type'l' poits, coeff6, type'b' coeff5, coeff6 ylim, ylim, ylim, ylim, ylim, ylim, ylim, xlab' xlab' xlab' xlab' xlab' xlab' xlab' ', ', ', ', ', ', ', ylab' ylab' ylab' ylab' ylab' ylab' ylab' ', ', ', ', ', ', ', mai'coefficiets ', type'l' mai'coefficiets ', type'l' mai'coefficiets 3', type'b' mai'coefficiets ', type'l' mai'coefficiets 5', type'l' mai'coefficiets 6', type'b' mai'all coefficiets', type'l' This code geerates the followig array of figures: 60 Solved Exercises ad Problems of Statistical Iferece

By usig this code, it is also possible to study either aalytically or graphically the asymptotic behaviour of these estimators but oly with simulated data of some particular distributios for, what would ot be a whole mathematical proof. It is worth oticig that the formulas obtaied i this exercise are valid for ormal populatios because of the theoretical results o which they are based. I the geeral case, the expressios for the mea square error of these estimators are more complex. Geeral case The expressios of their mea square error are: V V σ MSE s s σ [ ] MSE S S σ MSE MSE s p MSES p MSEV p σ y σ y σ We have simplified the expressios as much as possible, ad ow a geeral compariso ca be tacked by doig some pairwise comparisos. Firstly, by lookig at the coefficiets MSE s s MSE V V < MSE S S ad the equality is reached oly whe. O the other had, MSE V p MSE s p < MSE S p Now, we would like to allocate V p, V V, s p ad S p i the first chai. To compare V p ad s p with 6 0 Solved Exercises ad Problems of Statistical Iferece

That is, MSE V p MSE s p MSE V ad the equality is attaied oly whe. To compare S p with V V V That is, { V MSE S MSE V MSE S p MSE p V V if if Ituitively, i the regio aroud the bisector lie the differece of the sample meas is small, ad therefore the pooled sample variace is worse; o the other had, i the complemetary regio the square of the differece is bigger tha twice the sum of the sizes, ad, therefore, the pooled sample variace is better. The frotier seems to be parabolic. Some work ca be doe to fid the frotier determied by the equality ad the two regios o both sided this is doe i the mathematical appedix. Now, we write some force-based lies for the computer to plot these poits i the frotier: N 00 vectornx vectormode"umeric", legth0 vectorny vectormode"umeric", legth0 for x i :N { for y i :N { if *xyx-y^ { vectornx cvectornx, x; vectorny cvectorny, y } } } plotvectornx, vectorny, xlim c0,n, ylim c0,n, xlab'x', ylab'y', maipaste'frotier of the regio', type'p' To compare S p with S S 0 That is, MSE S p MSE S S ad the equality is attaied oly if the sample sizes are the same. We ca summarize all the results of this sectio i the followig statemet: 6 Solved Exercises ad Problems of Statistical Iferece

Propositio For two idepedet ormal populatios, whe a MSE s s MSE V V MSE V p MSE s p < MSE S pmse S S I the geeral case, whe the sample sizes ca be differet, b MSE s s MSE V V < MSE S S c MSE V p MSE s p < MSE S p V V MSE S MSE V V if e MSE S MSE V V if f MSE S MSE S S d MSE V p MSE s p MSE { p p p I b, d ad f, the equality is attaied whe. s s, but I have ot maaged to solve the iequalities. O the other had, these relatios show that, for two idepedet ormal populatios, there exist estimators with smaller mea square error tha the pooled sample variace S p. Nevertheless, there are other criteria differet to the mea square error, ad, additioally, the pooled sample variace has also some advatages see the advaced theory at the ed. Note: I have tried to compare V p, s p ad S p with Coclusio: For some pooled estimators, the mea square errors have bee calculated either directly or makig a proper statistic appear. The cosistecies i mea square error of order two ad i probability have bee proved. By usig theoretical expressios for the mea square error, the behaviour of the pooled estimators for the proportio Beroulli populatios ad for the variace ormal populatios have bee compared with atural estimators cosistig i the semisum of the idividual estimators for each populatio. Oce more, it is worth oticig that there are i geeral several matters to be cosidered i selectig amog differet estimators of the same quatity: a The error ca be measured by usig a quatity differet to the mea square error. b For large sample sizes, the differeces provided by the formulas above may be egligible. c The computatioal or maual effort i calculatig the quatities must also be take ito accout ot all of them requires the same umber of operatios. d We may have some quatities already available. Advaced Theory: The previous estimators ca be writte as a sum ω θ^ ω θ^ with weights ωω,ω measure of the o the sample icrease i the such that ω ω. As regards the iterpretatio of the weights, they ca be see as a importace that each estimator is give i the global formula. For some weights that depeds sizes, it is possible for oe estimator to adquire all the importace whe the sample sizes proper way. O the cotrary, whe the weights are costat the possible effect positive or 63 Solved Exercises ad Problems of Statistical Iferece

egative due to each estimator is bouded. The errors were calculated whe the data are represetative of the populatio, but if the quality of oe sample is always small, the other sample caot do the whole estimatio if the weights do ot deped o the sizes. My otes: [PE] Methods ad Properties Exercise pe We have reliable iformatio that suggests the probability distributio with desity fuctio f x ; θ θ x, x [0, θ], θ as a model for studyig the populatio quatity. Let,..., be a simple radom sample. a Apply the method of the momets to fid a estimator θ M of the parameter θ. b Calculate the bias ad the mea square error of the estimator θ M. c Study the cosistecy of θ M. d Try to apply the maximum likelihood method to fid ad estimator θ^ ML of the parameter θ. e Obtai estimators of the mea ad the variace. Hit: i Use that μ E θ/3 ad E θ/6. Discussio: This statemet is mathematical. The assumptios are supposed to have bee checked. We are give the desity fuctio of the distributio of a dimesioless quatity. The exercise ivolves two methods of estimatio, the defiitio of the bias, the mea square error ad the sufficiet coditio for the cosistecy i probability. The two first populatio momets are provided. Note: If E ad E had ot bee give i the statemet, they could have bee calculated by applyig the defiitio ad solvig the itegrals, θ E x f x ; θdx 0 x θ 3 θ θ x dx θ θ [ ] [ ] x x θ 0 3 θ 0 θ θ θ [ ] [ ] 0 θ x θ dx 0 x dx 0 θ θ3 θ θ θ 3 6 3 θ E x f x ; θdx 0 x x3 x θ 3 0 θ θ θ x dx θ θ θ 0 θ x θ dx 0 x 3 dx θ3 θ 3 θ θ θ θ 3 3 3 6 θ a Method of the momets a Populatio ad sample momets The distributio has oly oe parameter, so oe equatio suffices. By usig the iformatio i the hit: 6 Solved Exercises ad Problems of Statistical Iferece

μ θ θ 3 ad a System of equatios μ θ m x, x,..., x m x, x,..., x θ j x j x 3 x x j j θ0 3 x 3 x j j a3 The estimator It is obtaied after substitutig the lower case letters xj by upper case letters i: 3 θ^ M j j 3 b Bias ad mea square error b Bias To apply the defiitio b θ M E θ M θ we eed to calculate the expectatio: E θ^ M E 3 3 E 3 E 3 θ θ 3 where we have used the properties of the expectatio, a property of the sample mea ad the iformatio i the statemet. Now b θ^ M E θ^ M θ θ θ 0 ad we ca see that the estimator is ubiased we could see it also from the calculatio of the expectatio. b Mea square error We do ot usually apply the defiitio MSE θ M E θ M θ but a property derived from it, for which we eed to calculate the variace: 3 θ θ Var E E ^ Var θ M Var 3 3 3 6 3 [ ] 3 θ 3 θ θ 6 9 8 where we have used the properties of the variace, a property of the sample mea ad the iformatio i the statemet. The MSE θ^ M b θ^ M Var θ^ M 0 θ θ c Cosistecy We try applyig the sufficiet coditio lim MSE θ0 or, equivaletly, lim b θ0. Sice lim Var θ0 { lim MSE θ^ M lim θ 0 it is cocluded that the estimator is cosistet i mea of order two ad hece i probability for estimatig θ. d Maximum likelihood method d Likelihood fuctio: The desity fuctio is f x ; θ 65 θ x for 0 x θ, so θ Solved Exercises ad Problems of Statistical Iferece

L x, x,..., x ; θ j f x j ; θ θ x j j θ d Optimizatio problem: First, we try to fid the maximum by applyig the techique based o the derivatives. The logarithm fuctio is applied, log[ L x, x,..., x ; θ] log logθ j logθ x j ad the first coditio leads to a useless equatio: d 0 log[ Lx, x,..., x ; θ]0 θ j dθ θ x j? The, we realize that global miima ad maxima caot always be foud through the derivatives oly if they are also local extremes. I this case, it is difficult eve to kow whether L mootoically decreases with θ or ot, sice part of L icreases ad aother decreases which oe chages more? We study the j-th elemet of the product, that is, fxj;θ. Its first derivative is θ θ x j θ θ x j θ so it has a extreme i θ x j θ θ This implies that L is the product of terms havig the extreme i a differet way, so L does ot chage mootoically with the parameter θ. f ' x j ; θ d3 The estimator:? e Estimators of the mea ad the variace To obtai estimators of the mea, we take ito accout that μe θ ad apply the plug-i priciple: 3 ^θ M 3 θ^ max { } μ^ μ^ M 3 3 3 3 ML j j ML To obtai estimators of the variace, sice σ Var θ 6 θ^ σ^ M M 6 6 3 σ^ ML? Coclusio: The method of the momet is applied to obtai a estimator that is ubiased for ay sample size ad has good behaviour whe used with for large may data. The maximum likelihood method caot be applied sice it is difficult to optimize the likelihood fuctio by cosiderig either its expressio or the behaviour of the desity fuctio. My otes: Exercise pe Let be a radom variable followig the Rayleigh distributio, whose with probability fuctio is x x f x ; θ e θ, x 0, θ> 0 θ π such that E θ π ad Var θ. Let,..., be a simple radom sample. 66 Solved Exercises ad Problems of Statistical Iferece

a Apply the method of the momets to fid a estimator θ^ M of the parameter θ. b For θ^ M, calculate the bias ad the mea square error, ad study the cosistecy. c Apply the maximum likelihood method to fid ad estimator θ^ MV of the parameter θ. CULTURAL NOTE From: Wikipedia. I probability theory ad statistics, the Rayleigh distributio is a cotiuous probability distributio for positive-valued radom variables. A Rayleigh distributio is ofte observed whe the overall magitude of a vector is related to its directioal compoets. Oe example where the Rayleigh distributio aturally arises is whe wid velocity is aalyzed ito its orthogoal -dimesioal vector compoets. Assumig that the magitudes of each compoet are ucorrelated, ormally distributed with equal variace, ad zero mea, the the overall wid speed vector magitude will be characterized by a Rayleigh distributio. A secod example of the distributio arises i the case of radom complex umbers whose real ad imagiary compoets are i.i.d. idepedetly ad idetically distributed Gaussia with equal variace ad zero mea. I that case, the absolute value of the complex umber is Rayleigh-distributed. The distributio is amed after Lord Rayleigh. Discussio: This is a theoretical exercise where we must apply two methods of poit estimatio. The basic properties must be cosidered for the estimator obtaied through the first method. Note: If E had ot bee give i the statemet, it could have bee calculated by applyig itegratio by parts sice polyomials ad expoetials are fuctios of differet type : E x f x ; θ dx 0 x θ 0 where 0 e t ux x ] 0 has bee used with u ' x θ x x v ' e θ x θ dt θ π θ π u x v ' x dxu x v x u ' x v x dx [ 0 e dx x x x e θ dx x e θ e θ dx θ The, we have applied the chage x x v e θ dx e θ θ x t θ xt θ dxdt θ We calculate the variace by usig the first two momets. For the secod momet, we ca apply itegratio by parts twice as the expoet decreases oe uit each time E 0 where x [ x x u x v ' x dxu x v x u ' x v x dx ux ] x has bee used with u ' x x x v ' e θ θ x x x e θ dx x e θ x e θ dx 0 0 θ 0 e θ dxθ θ θ x x x v e θ dx e θ θ π Var E E θ θ π θ. I substitutig, that ex chages faster tha xk for ay The variace is k has bee take ito accout. O the other had, i a advaced table of itegrals like those physicists or egieers use, oe ca fid 0 e a x dx see the appedixes of Mathematics or 67 0 x e a x dx directly. Solved Exercises ad Problems of Statistical Iferece

a Method of the momets Sice there appears oly oe parameter i the desity fuctio, oe equatio suffices; moreover, sice the expressio of μ E ivolves θ, the equatio ad the solutio are: θ π x μ θ x θ π x π x θ^ M π b Bias, mea square error ad cosistecy E θ^ M E Mea or expectatio: π π E π E π θ π θ Bias: b θ^ M E θ^ M θθ θ0 π π π Var π Var π π θ θ π Variace: Var θ^ M Var Mea square error: θ^ M is a ubiased estimator of θ. π π ECM θ^ M b θ^ M Var θ^ M 0 θ θ π π π lim MSE θ^ M lim θ 0 ad therefore σ M is cosistet for θ. π Cosistecy: c Maximum likelihood method Likelihood fuctio: L ; θ j f x j ; θ f x ; θ f x ; θ x e x θ θ x e x θ θ j xj e θ xj θ Log-likelihood fuctio: To facilitate the differetiatio, θ is moved to the umerator ad a property of the logarithm is applied. log L ; θ log j xj x log θ log j θ j xj j x j log θ θ Search for the maximum: 0 d log L ; θ 0 dθ j x j θ θ θ j x j θ 3 θ θ j x j θ 3 θ j x j Now we prove the coditio o the secod derivative. d d log L ; θ dθ dθ j x j θ3 x j j x j 6 3 θ 3 j θ θ θ θ θ The first term is egative ad the secod is positive, but it is difficult to check qualitatively whether the secod is larger i absolute value tha the first. The, the extreme obtaied is substituted: j x j j x j d log L ; σ 3 3 <0 d σ j x j x x x x j j j j j j j j Thus, the extreme is really a maximum. 68 Solved Exercises ad Problems of Statistical Iferece

The estimator: θ^ ML j j Discussio: The Rayleigh distributio is oe of the few cases for which the two methods provide differet estimators of the parameter. I the first case, we could easily calculate the mea ad the variace, as the estimator was liear i j; evertheless, i the secod case the oliearities j ad the square root make those calculatios difficult. My otes: Exercise 3pe Before commercializig a ew model of light bulb, a deep statistical study o its duratio measured i days, d must be carried out. The populatio variable duratio is expected to follow the expoetial probability model: Let,..., be a simple radom sample. The, we wat to: a Apply the method of the momets to fid a estimator of the parameter λ. b Apply the maximum likelihood method to fid a estimator of the parameter λ. c Fid a sufficiet statistic see the hit below. d Prove that is ot a efficiet estimator of λ. e Prove that is a cosistet estimator of λ. f Prove that is a efficiet estimator of λ. To cope with this, use the followig alterative, equivalet otatio i terms of θ λ Now you must prove that is a efficiet estimator of θ ad you ca easily calculate d, while dθ d. d λ g The empirical part of the study, based o the measuremet of 55 idepedet light bulbs, has yielded a oly experts ca calculate 55 total sum of j x j 598 d. Itroduce this iformatio i the expressios obtaied i previous sectios to give fial estimates of λ. h Give a estimate of the mea μ E. Hit: For sectio c, apply the factorizatio theorem ad make it clear how the two parts are. I the theorem: g ad h are oegative; T caot deped o θ; g depeds oly o the sample ad the parameter, ad it depeds o the sample through T; 3 h ca be ; ad sice h is ay fuctio of the sample, it may ivolve T. Discussio: First of all, the suppositio that the expoetial distributio ca reasoably be used to model the variable duratio should be tested. Oe aim of this exercise is to show how may methods ad properties ivolved i previous exercises ca be ivolved i the same statistical aalysis. The quality of the estimators 69 Solved Exercises ad Problems of Statistical Iferece

obtaied is studied. See the appedixes to see how the mea ad the variace of this distributio could be calculated, if ecessary. a Method of the momets a Populatio ad sample momets: The populatio distributio has oly oe parameter, so oe equatio suffices. The first-order momets of the model ad the sample x are, respectively, μ λ E ad m x, x,..., x j x j x λ a System of equatios: Sice the parameter of iterest λ appears i the first momet of, the solutio is: μ λm x, x,..., x j x j x λ λ j x j x a3 The estimator: λ^ M j j b Maximum likelihood method b Likelihood fuctio: For a expoetial radom variable the desity fuctio is f x ; λλ e λ x, so we write the product ad joi the terms that are similar λ j x j L x, x,..., x ; λ j f x j ; λ j λ e λ x λ e λ x λ e λ x λ e λ x λ e j b Optimizatio problem: The logarithm fuctio is applied to make calculatios easier λ j x j log [ L x, x,..., x ; λ]log[λ ]log[e ] log[λ] λ j x j The populatio distributio has oly oe parameter, ad hece a oedimesioal fuctio must be maximized. To fid the local or relative extreme values, the ecessary coditio is: 0 d log[ L x, x,..., x ; λ] λ j x j λ 0 j x j dλ x To verify that the oly cadidate is a local maximum, the sufficiet coditio is: d log[ L x, x,..., x ; λ] < 0 dλ λ which holds for ay value, particularly for λ 0. x b3 The estimator: λ^ ML j j c Sufficiet statistic Both to prove that a give statistic is sufficiet ad to fid a sufficiet statistic, we apply the factorizatio theorem see the hit. c Likelihood fuctio: Computed previously, it is L,,..., ; λλ e 70 Solved Exercises ad Problems of Statistical Iferece λ j j.

c Theorem: L,,..., ; λ g T,,..., ; λ h,,...,. We must aalise each term of the likelihood fuctio. λ depeds oly o the parameter, so it would be part of g. λ j depeds o both the parameter ad the sample, ad it is ot possible to separate e mathematically both types of iformatio; the, this term would be part of g too. Moreover, the oly j cadidate to be a sufficiet statistic is T T,..., j j. λ j Sice the coditio holds for g T,,..., ; λλ e j ad h,,...,, the statistic T T,..., j j is sufficiet. This meas that it summarizes the importat iformatio about the parameter cotaied i the sample. The previous statistic cotais the same iformatio as ay oe-to-oe trasformatio of it, cocretely the sample mea T j j. d is ot a efficiet estimator of λ The defiitio of efficiecy cosists of two coditios: ubiasedess ad miimum variace this latter is checked by comparig the variace with the Cramér-Rao's boud. d Ubiasedess: By applyig a property of the sample mea ad the iformatio of the statemet, E E λ b E λ λ λ 0 The first coditio does ot hold for all values of λ, ad hece it is ot ecessary to check the secod oe. λ0 λ Note: The previous bias is zero whe λ± λ for fx to be a probability fuctio, λ must be positive, so the solutio is ot take ito accout. Thus, whe λ, the estimator may still be efficiet if the secod coditio holds. e is a cosistet estimator of λ To prove the cosistecy i probability, we will apply ay of the followig sufficiet coditios cosistecy i mea of order two lim b θ 0 lim MSE θ0 lim Var θ 0 e Bias: By applyig a property of the sample mea ad the iformatio of the statemet, { E E λ λ 0 b E λ λ lim 0 0 lim b e Variace: By applyig a property of the sample mea ad the iformatio of the statemet, Var Var λ lim lim Var 0 λ As a coclusio, the mea square error MSE teds to zero, which is sufficiet but ot ecessary for the cosistecy i probability. f is a efficiet estimator of λ Now, we are recommeded to use the otatio 7 Solved Exercises ad Problems of Statistical Iferece

where θλ. f Ubiasedess: By applyig a property of the sample mea ad the iformatio of the statemet, E θ E λ θ θ 0 b E The first coditio holds, ad hece it is ecessary to check the secod oe. f Miium variace: We compare the variace ad the Cramér-Rao's boud. The variace is: Var θ Var O the other had, the boud is calculated step by step: i. Fuctio with i place of x f ; θ e θ θ ii. Logarithm of the fuctio: log[ f ; μ]logθ loge θ logθ θ iii. Derivative of the logarithm of the fuctio: log [ f ; θ] θ θ θ θ θ iv. Expectatio of the squared partial derivative of the logarithm of the fuctio: We rewrite the expressio so as to make σ Var E E E μ appear. I this case, it also holds that σ Var E θ θ. The [ log[ f ; θ] E θ ] [ E θ θ θ θ v. Theoretical Cramér-Rao's lower boud: ] [ ] θ Var θ E θ θ θ θ E [ log [ f ;θ ] θ ] θ θ The variace of the estimator attais the boud, so the estimator has miimum variace. The fulfillmet of the two coditios proves that is a efficiet estimator of λ θ. g Estimatio of λ It is ecessary to use the oly iformatio available: 55 j x j 598 d. From the method of the momets: λ^ M x j 598 d 0.0997 d. j 55 From the maximum likelihood method, sice the same estimator was obtaied: λ^ ML0.0997 d. h Estimatio of μ Sice μe, a estimator of λ iduces, by applyig the plug-i priciple, a estimator of μ: λ 7 Solved Exercises ad Problems of Statistical Iferece

598 d From the method of the momets: μ^ M λ^ M 0.87 d. 55 From the maximum likelihood method: μ^ ML 0.87 d. Note: From the umerical poit of view, calculatig 598/55 is expected to have smaller error tha calculatig /0.09973. Coclusio: We ca see that for the expoetial model the two methods provide the same estimator for λ. The estimator obtaied has bee used to obtai a estimator of the populatio mea. The mea duratio estimate of the ew model of light bulb was 0.87 days. O the other had, some desirable properties of the estimator have bee proved. A differet, equivalet otatio has bee used to facilitate the proof of oe of these properties, which emphasizes the importace of the otatio i doig calculatios. My otes: 73 Solved Exercises ad Problems of Statistical Iferece

Cofidece Itervals [CI] Methods for Estimatig Remark ci: Cofidece ca be iterpreted as a probability so it is, although we sometimes use a 0-to-00 scale. See remark pt, i the appedix of Probability Theory, o the iterpretatio of the cocept of probability. Remark ci: Sice there is a ifiite umber of pairs of quatiles a ad a such that P a T a α, those determiig tails of probability α/ are cosidered by covetio. This criterio is also applied for two-tailed hypothesis tests. Remark 3ci: Whe the Cetral Limit Theorem ca be applied, asymptotic results o averages are relatively idepedet of the iitial populatio. Therefore, i some exercises there are ot suppositios o the distributio of the populatio variables. Exercise ci-m To forecast the yearly iflatio i percet, %, a simple radom sample has bee gathered:.5..9.3.5 3. 3.0 It is assumed that the variable iflatio follows a ormal distributio. a By usig these data, costruct a 99% cofidece iterval for the mea of the iflatio. b Experts have the opiio that the previous iterval is too wide, ad they wat a total legth of a uit. Fid the level of cofidece for this ew iterval. c Costruct a cofidece iterval of 90% for the stadard deviatio. Discussio: The itervals will be built by applyig the method of the pivot, ad the the expressio of the margi of error is determied. Sice variaces are oegative by defiitio ad the positive brach of the square root fuctio is strictly icreasig, the iterval for the stadard deviatio is obtaied by applyig the square root to the iteval for the variace. Idetificatio of the variable ~ Nμ,σ Predicted iflatio of oe coutry Sample iformatio Theoretical simple radom sample:,..., 7 s.r.s. 7 Empirical sample: x,..., x7.5..9.3.5 3. 3.0 I this exercise, we kow the values of the sample xi. This allows calculatig ay quatity we wat. a Cofidece iterval for the mea: To choose the proper pivot, we take ito accout: The variable of iterest follows a ormal distributio. The populatio variace σ is ukow, so it must be estimated by the sample quasivariace. The sample size is small, 7, so we should ot thik about the asymptotic framework. From a table of statistics e.g. i [T], the pivot 7 Solved Exercises ad Problems of Statistical Iferece

T ; μ is selected. The S μ αpl α / T ;μ r α/ P r α / μ S μ t S r α/ P r α/ S S μ r α / r α / S μ r α/ S P r α / S μ r α / S P so [ ] r α / S, r α/ S I α where r α / is the quatile such that PT > r α/ α /. Let us calculate the quatities i the formula: x 7 x.36 7 j j The level of cofidece is 99%, ad hece α 0.0. The quatile is foud i the table of the t distributio with κ 7 degrees of freedom r α / r 0.0 /r 0.005 3.7 By usig the data, Fially, 7 S 7 x j x [.5 %.35 % 3.0 %.35 % ]0.36 % j 7 7 The, the iterval is [ I 0.99.35 % 3.7 ] 0.36 % 0.36 %,.35 %3.7 [.5 %, 3.0 %] 7 7 whose legth is 3.0%.5%.69%. b Cofidece level: The legth of the iterval, the distace betwee the two edpoits, is twice the margi of error whe T follows a symmetric distributio. S S S L r α/ r α / r α / I this sectio L is give ad α must be foud; evertheless, it is ecessary to fid r α / previously. r α / L 7 %.0 S 0.6 % > -pt.0, 7- [] 0.0350509 I the table of the t law it is foud that α/ 0.035, so α 0.07 ad α 0.93. The cofidece level is 93%. c Cofidece iterval for the stadard deviatio: To choose the ew statistic: The variable of iterest follows a ormal distributio. The quatity of iterest it the stadard deviatio σ. The populatio mea μ is ukow. The sample size is small, 7, so we should ot thik about the asymptotic framework From a table of statistics e.g. i [T], the proper pivot 75 Solved Exercises ad Problems of Statistical Iferece

T ; σ is selected. The S χ σ l α/ rα / S S S αp l α/ r α/ P P σ l α/ rα/ σ S σ S ad hece the iterval is I α [ S S, rα / l α/ ] The quatities i the formula are: Sample size 7, so 6 > qchisqc0.05, -0.05, 7- [].635383.59587 S 0.36 % Sice α 0. ad κ 6, the quatiles are l 0.05.6 ad r 0.05.6 By substitutig ad applyig the square root fuctio, the iterval is I 0.9 [ 6 0.36 %,.6 ] 6 0.36 % [ 0. %,.8 %].6 Coclusio: The legth i sectio b is smaller tha i sectio a, that is, the iterval is arrower ad the cofidece is smaller. My otes: Exercise ci-m I the library of a uiversity, the mea duratio i days, d of the borrowig period seems to be 0d. A simple radom sample of 00 books is aalysed, ad the values 8d ad 8d are obtaied for the sample mea ad the sample variace, respectively. Costruct a 99% cofidece iterval for the mea duratio of the borrowigs to check if the iitial populatio value is iside. Discussio: For so may data, asymptotic results are cosidered. The method of the pivotal quatity ca also be applied. The dimesio of the variable duratio is time, while the uit of measuremet is days. Idetificatio of the variable: Duratio of oe borrowig ~? Sample iformatio: Theoretical simple radom sample:,...,00 s.r.s. 00 Empirical sample: x,...,x00 x 8 d, s 8 d The values xj of the sample are ukow; istead, the evaluatio of some statistics is give. These quatities ad S. must be sufficiet for the calculatios, ad, therefore, formulas must be writte i terms of 76 Solved Exercises ad Problems of Statistical Iferece

Cofidece iterval: To select the pivot, we take ito accout: Nothig is said about the probability distributio of the variable of iterest The sample size is big, 00 >30, so a asympotic expressio ca be cosidered The populatio variace is ukow, but it is estimated through the sample variace From a table of statistics e.g. i [T], the proper pivot T ; μ μ S N 0, is chose, where S is the sample quasivariace. By applyig the method of the pivotal quatity: αp l α/ T ;μ r α/ P r α/ P r α / The, the iterval is μ S r α/ P r α / S S μ r α/ S S S S μ r α/ P r α/ μ r α/ [ I α r α / ] S S, r α/ where r α / is the quatile such that PZ> r α / α /. Substitutio: We calculate the quatities ivolved i the formula, x 8 d Sample mea For a cofidece of 99%, α 0.0 ad To calculate 00 r α /.58 00 S the property S j x j x s is used: S 00 s 8 d 8.d 99 The iterval is [ I 0.99 8 d.58 8.d, 8 d.58 8. d 00 00 ] [7.7 d, 8.73 d ] Coclusio: The mea duratio estimate of the borrowigs belogs to the iterval obtaied with 99% cofidece. The iitial value μ 0d is ot iside the high-cofidece iterval obtaied, that is, it is ot supported by the data. Remember: statistical results deped o: the assumptios, the methods, the certaity ad the data. My otes: Exercise 3ci-m The accoutig firm Price Waterhouse periodically moitors the U.S. Postal Service's performace. Oe parameter of iterest is the percetage of mail delivered o time. I a simple radom sample of 33,000 77 Solved Exercises ad Problems of Statistical Iferece

mailed items, Price Waterhouse determied that 8,00 items were delivered o time Tampa Tribue, March 6, 995. Use this iformatio to estimate with 99% cofidece the true percetage of items delivered o time by the U.S. Postal Servece. Take from: Statistics. J.T. McClave ad T. Sicich. Pearso. Discussio: The populatio is characterized by a Beroulli variable, sice for each item there are oly two possible values. We must costruct a cofidece iterval for the proportio a percet is a proportio expressed i a 0-to-00 scale. Proportios have o dimesio. Idetificatio of the variable: Delivered o time oe item? ~ Bη Cofidece iterval For this kid of populatio ad amout of data, we use the statistic: T ; η d η η N 0,?? For cofidece itervals η is ukow ad o value is supposed, ad where? is substituted by η or η. hece it is estimated through the sample proportio. By applyig the method of the pivot: αp l α/ T ; η r α/ P r α / P r α / η η η η r α / η η η η η η η η η r α / α/ rα / η P η r η η r α/ P η η η η η α/ η η r The, the iterval is [ r α/ I α η η η η η r α /, η ] Substitutio: We calculate the quatities i the formula, 33000 η 800 0.850 33000 99% α 0.99 α 0.0 α/ 0.005 r α /r 0.005l 0.995.58 So [ I 0.99 0.850.58 ] 0.850 0.850 0.850 0.850, 0.850.58 [0.88, 0.85] 33000 33000 Coclusio: With a cofidece of 0.99, measured i a 0-to- scale, the value of η will be i the iterval 78 Solved Exercises ad Problems of Statistical Iferece

obtaied. I average, 99% times the method applied provides a right iterval. Noetheless, frequetly we do ot kow the real η ad therefore we will ever kow if the method has failed or ot. Remember: statistical results deped o: the assumptios, the methods, the certaity ad the data. My otes: Exercise ci-m Two idepedet groups, A ad B, cosist of 00 people each of whom have a disease. A serum is give to group A but ot to group B, which are termed treatmet ad cotrol groups, respectively; otherwise, the two groups are treated idetically. Two simple radom samples have yielded that i the two groups, 75 ad 65 people, respectively, recover from the disease. To study the effect of the serum, build a 95% cofidece iterval for the differece ηa ηb. Does the iterval cotai the case ηa ηb? Discussio: There are two idepedet Beroulli populatios. The iterval for the differece of proportio is built by applyig the method of the pivot. Proportios are, by defiitio, dimesioless quatities. Idetificatio of the variable: Havig got better or ot is a dichotomic situatio, A Recuperatig a idividual of the treatmet group? A ~ BηA B Recuperatig a idividual of the cotrol group? B ~ BηB Pivot: We take ito accout that: There are two idepedet Beroulli populatios Both sample sizes are large, 00, so a asymptotic approximatio ca be applied From a table of statistics e.g. i [T], the followig pivot is selected T A, B ; ηa, ηb η A η B ηa ηb η A η A η η B B A B d N 0, Evet rewritig: αp l α/ T A, B ; ηa, ηb r α/ P r α / P r α/ η A η B ηa ηb η A η A η B η B A B r α/ A η A η B η B η A η B η B η η η A η B ηa ηb r α/ A A B A B P η A η B r α / P η A η B r α / A η A η η B A η A η B η B η η B ηa ηb η A η B r α / A B A B A η A η η B A η A η η B η η B ηa ηb η A η B r α / B A B A B 79 Solved Exercises ad Problems of Statistical Iferece

3 The iterval: [ I α η A η B r α / A η A η B η B A η A η η B η η, η A η B r α / B A B A B ] where r α / is the value of the stadard ormal distributio such that P Z>r α/ α /. Substitutio: We eed to calculate the quatities ivolved i the previous formula, A 00 ad B 00. Theoretical simple radom sample: A,...,A00 s.r.s. each value is or 0. Empirical sample: a,...,a00 00 j a j 75 η^ A 00 75 a j 0.75 j 00 00 Theoretical simple radom sample: B,...,B00 s.r.s. each value is or 0 Empirical sample: b,...,b00 00 j b j 65 00 65 b j 0.65. j 00 00 r α/.96. η^ B 95% α 0.95 α 0.05 α/ 0.05 The, I 0.950.75 0.65.96 0.75 0.75 0.65 0.65 [ 0.063, 0.6 ] 00 00 The case ηa ηb is icluded i the iterval. Coclusio: The lack-of-effect case ηa ηb caot be excluded whe the decisio has 95% cofidece. Sice η 0,, ay reasoable estimator of η should provide values i this rage or close to it. Because of the atural ucertaity of the samplig process radomess ad variability, i this case the smaller edpoit of the iterval was 0.063, which ca be iterpreted as beig 0. Whe a iterval of high cofidece is far from 0, the case ηa ηb ca clearly be discarded or rejected. Fially, it is importat to otice that a cofidece iterval ca be used to make decisios about hypotheses o the parameter values it is equivalet to a two-sided hypothesis test, as the iterval is also two-sided. Remember: statistical results deped o: the assumptios, the methods, the certaity ad the data. Advaced theory: Whe the assumptio ηa η ηb seems reasoable otice that this case is icluded i the 95% cofidece iterval just calculated, it makes sese to try to estimate the commo variace of the η^ η^ estimator as well as possible. This ca be doe by usig the pooled sample proportio η^ p A A B B i A B estimatig η η for the deomiator; oetheless, the pooled estimator should ot be cosidered i the umerator, as η^ p η^ p0 whatever the data are. The statistic would be: ~ T A, B η^ A η^ B ηa ηb η^ p η ^ p η ^ η ^ p p A B d N 0, Now, the expressio of the iterval would be [ I~ ^ A η^ B r α/ α η ^ p η ^ p η ^ η^ p ^ p η^ p η^ p η ^ p η η p, η^ A η^ B r α/ A B A B The quatities ivolved i the previous formula are A 00 ad B 00 80 Solved Exercises ad Problems of Statistical Iferece ]

η^ A 0.75 ad η^ B0.65, the pooled estimate is A η^a B η^b η^a η^b 0.75 0.65 η^ p 0.70 A B 95% α 0.95 α 0.05 α/ 0.05 r α/.96 Sice The, 0.70 0.70 I~ [ 0.070, 0.7] 0.95 0.75 0.65.96 00 Oe way to measure how differet the results are cosists i directly comparig the legth twice the margi of error i both cases: ~ L0.7 0.0700.5 L0.6 0.0630.53 Eve if the latter legth is larger, it is theoretically more trustable tha the former whe ηa η ηb is true. The geeral expressios of these legths ca be foud too: L r α/ η ^ A η^ A η^ B η ^ B A B η ^ p η ^ p η^ p η ^ p ~ L r α/ A B Aother way to measure how differet the results are ca be based o comparig the statistics: ~ T A, B η^ A η^ B ηa ηb T A, B η^ p η ^ p η ^ η ^ p p A B η ^ A η ^ A η^ B η^ B A B η^ A η^ B ηa ηb η ^ A η ^ A η ^ η ^ B B A B L ~ L η ^ A η ^ A η ^ η ^ B B A B η^ p η ^ p η ^ η ^ p p A B η^ A η^ A η ^ η ^ B B A B ~ T T ~~ so L T L T η^ p η^ p η^ p η ^ p η ^ p η ^ p η ^ η^ p p A B A B Thus, the quatity η^ A η^ A η ^ η ^ B B η^ η^ A η^ B η^ B 0.99 A ^ p η ^ p η ^ η^ p η η^ p η^ p p ca be see as a measure of the effect of usig the pooled sample proportio. This effect is little i this exercise, but it could be higher i other situatios. As regards the case ηa η ηb, it is also icluded i this iterval, which is ot worthy as it has bee used as a assumptio; evertheless, the exclusio of this case would have cotradicted the iitial assumptio. My otes: [CI] Miimum Sample Size Remark ci: I calculatig the miimum sample size to guaratee a give precisio by applyig the method based o the margi of error, the result is obtaied usig other results: theorem givig the samplig distributio of the pivot T ad the method of the pivot. 8 Solved Exercises ad Problems of Statistical Iferece

Whe the proper statistic T is based o the suppositio that the populatio variable follows a give parametric probability distributio, the whole process ca be see at a parametric approach; whe T is based o a asymptotic result, the oparametric Cetral Limit Theorem is idirectly beig applied. O the other had, the method based o the Chebyshev's iequality is valid whichever the probability distributio of the populatio variable ad oegative fuctio hx. The Cetral Limit Theorem, beig a oparametric result, seems more powerful tha the Chebyshev's iequality, based o a rough bidig see the appedixes. As a cosequece, we expect the method based o the this iequality to overestimate the miimum sample size. O the cotrary, the umber provided by the method based o the margi of error may be less trustable if the assumptios o which it is based are false. Remark 5ci: Oce there is a discrete quatity i a equatio, the ukow caot take ay possible value. This implies that, strictly speakig, equalities like Er α / σ σ α E may be ever fulfilled for cotiuous E, α, σ ad discrete. Solvig the equality ad roudig the result upward is a way alterative to solvig the iequalities E g Er α/ σ σ α Eg where the purpose is to fid the miimum for which the possible discrete values of the margi of error is smaller tha or equal to the give precisio Eg. Exercise ci-s The legths i millimeters, mm of metal rods produced by a idustrial process are ormally distributed with a stadard deviatio of.8mm. Based o a simple radom sample of ie observatios from this populatio, the 99% cofidece iterval was foud for the populatio mea legth to exted from 9.65mm to 97.75mm. Suppose that a productio maager believes that the iterval is too wide for practical use ad, istead, requires a 99% cofidece iterval extedig o further tha 0.50mm o each side of the sample mea. How large a sample is eeded to achieve such a iterval? Apply both the method based o the cofidece iterval ad the method based o the Chebyshev's iequality. From: Statistics for Busiess ad Ecoomics, Newbold, P., W.L. Carlso ad B.M. Thore, Pearso. Discussio: There is oe ormal populatio with kow stadard deviatio. By usig a sample of ie elemets, a 99% cofidece iterval was built, I [9.65mm, 97.75mm], of legth 97.75mm 9.65mm 3.mm ad margi of error 3.mm/.55mm. A arrower iterval is desired, ad the umber of data ecessary i the ew sample must be calculated. More data will be ecessary for the ew margi of error to be smaller 0.50 <.55 while the other quatities stadard deviatio ad cofidece are the same. Idetificatio of the variable: ~ Nμ, σ.8mm Legth of oe metal rod Sample iformatio: Theoretical simple radom sample:,..., s.r.s. the legths of rods are take Margi of error: We eed the expressio of the margi of error. If we do ot remember it, we ca apply the method of the pivot to take the expressio from the formula of the iterval. [ ] r α / σ, rα/ σ I α If we remembered the expressio, we ca use it. Either way, the margi of error for oe ormal populatio 8 Solved Exercises ad Problems of Statistical Iferece

with kow variace is: Er α / σ Sample size Method based o the cofidece iterval: We wat the margi of error E to be smaller or equal tha the give Eg,.8 mm 86.7 87 E g Er α/ σ E g r α / σ z α / σ.58 Eg 0.5 mm sice r α/ r 0.0 /r 0.005.58. The iequality does ot chage either whe multiplyig or dividig by positive quatities or squarig, while it chages whe ivertig. Method based o the Chebyshev's iequality: For ubiased estimators, it holds that: ^ ^ E P θ E ^ ^ E Var θ α P θ θ θ E ^ σ so Var θvar σ α Eg.8 mm α σ 96 96 Eg 0.0 0.5 mm Coclusio: At least data are ecessary to guaratee that the margi of error is equal to 0.50 this margi will ca be thought of as the maximum error i probability, i the sese that the distace or error θ θ be smaller that Eg with a probability of α 0.99, but larger with a probability of α 0.0. Ay umber of data larger tha would guaratee ad go beyod the precisio desired. As expected, more data are ecessary 86 > 9 to icrease the accuracy arrower iterval with the same cofidece. The miimum sample sizes provided by the two methods are quite differet see remark ci. Remember: statistical results deped o: the assumptios, the methods, the certaity ad the data. My otes: [CI] Methods ad Sample Size Exercise ci The mark of a aptitude exam follows a ormal distributio with stadard deviatio equal to 8.. A simple radom sample with ie studets yields the followig results: 9 j x j,098 9 j x j38,8 a Fid a 90% cofidece iterval for the populatio mea μ. b Discuss without calculatios whether the legth of a 95% cofidece iterval will be smaller, greater or equal to the legth of the iterval of the previous sectio. c How large must the miimum sample size be to obtai a 90% cofidece iterval with legth distace betwee the edpoits equal to 0? Apply the method based o the cofidece iterval ad also the method based o the Chebyshev's iequality. 83 Solved Exercises ad Problems of Statistical Iferece

Discussio: The suppositio that the ormal distributio is a appropriate model for the variable mark should be evaluated. The method of the pivot will be applied. After obtaiig the theoretical expressio of the iterval, it is possible to reaso o the relatio cofidece-legth. Give the legth of the iterval, the expressio also allows us to calculate the miimum umber of data ecessary. The mark ca be see as a quatity without ay dimesio. Fially, it is worth oticig that a approximatio is used, sice the mark is a discrete variable while the ormal distributio is cotiuous. Idetificatio of the variable: ~ Nμ, σ8. Mark of oe studet Sample iformatio: Theoretical simple radom sample:,..., 9 s.r.s. the marks of ie studets are to be take 9 Empirical sample: x,...,x9 9 9 j x j,098 j x j38,8 the marks have bee take We ca see that the sample values xj themselves are ukow i this exercise; istead, iformatio calculated from them is provided; this iformatio must be sufficiet for carryig out the calculatios. a Method of the pivotal quatity: To choose the proper statistic with which the cofidece iterval is calculated, we take ito accout that: The variable follows a ormal distributio We are give the value of the populatio stadard deviatio σ The sample size is small, 9, so asymptotic formulas caot be applied From a table of statistics e.g. i [T], the pivot T ; μ is selected. The αp l α/ T ;μ r α/ P r α/ μ σ μ N 0, μ r α/ σ r α/ P r α / σ σ r α / σ μ r α/ σ P r α / σ μ r α / σ P so [ ] r α / σ, rα/ σ I α where r α / is the value of the stadard ormal distributio verifyig P Z>r α / α /, that is, the value such that a area equal to α / is o the right upper tail. Substitutio: We calculate the quatities i the formula, x 9 x j,098 j 9 9 A 90% cofidece level implies that α 0., ad the quatile 8 r α / r 0.05.65 is i the table. Solved Exercises ad Problems of Statistical Iferece

From the statemet, Fially, 9 σ8. Thus, the iterval is [ I 0.9.65 ] 8. 8.,.65 [ 06.5, 37.6 ] 9 9 b Legth of the iterval: To aswer this questio it is possible to argue that, whe all the parameters but the legth are fixed, if higher certaity is desired it is ecessary to wide the iterval, that is, to icrease the distace betwee the two edpoits. The formal way to justify this idea cosists i usig the formula of the iterval: r α / σ r α/ σ r α / σ L Now, if σ ad remai uchaged, to study how L chages with α it is eough to see how the quatile moves. For the 95% iterval: α 0.05 α decreases with respect to the value i sectio a Now r α/ must leave less area probability o the right r α/ icreases L icreases I short, whe the tails α get smaller the iterval α gets wider, ad vice versa. c Sample size: Method based o the cofidece iterval: Now the 90% cofidece iterval of the first sectio is revisited. For give α ad Lg, the value of must be foud. From the expressio of the legth, 8. 86.08 87 L g L r α / σ L g r α / σ z α/ σ.65 Lg 0 Oly whe ivertig the iequality must be chaged. Method based o the Chebyshev's iequality: For ubiased estimators: ^ E P θ E ^ ^ E P θ θ θ ^ Var θ E α ^ σ so Var θvar σ α Eg 8. σ 38.0 39 α Eg 0 0. Coclusio: Give the other quatities, cofidece grows with the legth, ad vice versa. If a value greater tha were cosidered, a higher accuracy iterval would be obtaied; evertheless, i practice usually this would also imply higher expese of both time ad moey. The miimum sample sizes provided by the two methods are quite differet see remark ci. Remember: statistical results deped o: the assumptios, the methods, the certaity ad the data. My otes: 85 Solved Exercises ad Problems of Statistical Iferece

Exercise ci A 6-elemet simple radom sample of petrol cosumptio litres per 00 kilometers, u i private cars has bee take, yieldig a mea cosumptio of 9.36u ad a stadard deviatio of.u. The: a Obtai a 96% cofidece iterval for the mea cosumptio. b Assume both ormality for the cosumptio ad variace σ u. How large must the sample be if, with the same cofidece, we wat the maximum error to be a quarter of litre? Apply the method based o the cofidece iterval ad the method based o the Chebyshev's iequality. From 007's exams for accessig to the Spaish uiversity. Discussio: For 6 data, asymptotic results ca be applied. The method of the pivotal quatity will be applied. The role of the umber 00 is o other tha beig part of the uits i which the data are measured. For the secod sectio, additioal suppositios added by myself are cosidered; i a real-world situatio they should be evaluated. Idetificatio of the variable: C Cosumptio of oe private car, measured i litres per 00 kilometers C~? Sample iformatio: Theoretical simple radom sample: C C,...,C6 s.r.s. 6 c 9.36u, Empirical sample: c c,...,c6 s. u The values cj of the sample are ukow; istead, the evaluatio of some statistics is give. These quatities ad s. must be sufficiet for the calculatios, so formulas must ivolve C a Cofidece iterval: To select the pivot, we take ito accout: Nothig is said about the probability distributio of the variable of iterest The sample size is big, 6 >30, so a asympotic expressio ca be used The populatio variace is ukow, but it is estimated by the sample variace From a table of statistics e.g. i [T], the followig pivot is selected T C ;μ μ C S N 0, where S will be calculated by applyig the relatio s S. By applyig the method of the pivot: αp l α/ T C ; μ r α / P r α/ P C r α/ C μ S r α/ P r α/ S S C μ r α / S S S S μ C r α / P C r α / μ C r α/ The, the cofidece iterval is 86 Solved Exercises ad Problems of Statistical Iferece

[ ] r α/ S, C r α/ S I α C where r α / is the quatile such that PZ> r α / α /. Substitutio: We calculate the quatities i the formula, c 9.36u. Sample mea For a cofidece of 96%, α 0.0 ad The sample quasivariace is Fially, 6. S r α / r 0.0 / r 0.0l 0.98.05. 6 s. u.99 u. 63 The iterval is [ I 0.96 9.36 u.05 ].99 u.99 u, 9.36 u.05 [9.00 u, 9.7u] 6 6 b Miimum sample size: Method based o the cofidece iterval: To select the pivot, we take ito accout the ew suppositios: The variable of iterest follows a ormal distributio The populatio mea is beig studied The populatio variace is kow From a table of statistics e.g. i [T], the followig pivot is selected ow the exact samplig distributio is kow, istead of the asympotic distributio T C ;μ C μ σ N 0, By doig calculatios similar to those of the previous sectio or exercise, the iterval is [ ] I α C r α/ σ, C r α / σ from which the expressio of the margi of error is obtaied, amely: Er α / σ. Values ca be substituted either before or after breakig a iequality; this time let us use umbers from the begiig: u u E g u E.05 u.05.05 35.0 36 Whe ivertig, the iequality must be chaged. Method based o the Chebyshev's iequality: For ubiased estimators: ^ ^ E P θ E ^ ^ E Var θ α P θ θ θ E ^ σ so Var θvar 87 Solved Exercises ad Problems of Statistical Iferece

σ α Eg σ α Eg u 800 0.0 u Coclusio: The ukow mea petrol cosumptio of the populatio of private cars belogs to the.99 u 0.36 u, while 6 36 data are eeded for the margi to be / 0.50. The miimum sample sizes provided by the two methods are quite differet see remark ci. Remember: statistical results deped o the assumptios, the methods, the certaity ad the data. iterval obtaied with 96% cofidece. For 6 data, the margi of error were.055 My otes: Exercise 3ci ou have bee hired by a cosortium of dairy farmers to coduct a survey about the cosumptio of milk. Based o results from a pilot study, assume that σ 8.7oz. Suppose that the amout of milk is ormally distributed. If you wat to estimate the mea amout of milk cosumed daily by adults: a How may adults must you survey if you wat 95% cofidece that your sample mea is i error by o more tha 0.5oz? Apply both the method based o the cofidece iterval ad the method based o the Chebyshev's iequality. b Calculate the margi of error if the umber of data i the sample were twice the miimum rouded value that you obtaied. Is ow the margi of error half the value it was? Based o a exercise of: Elemetary Statistics. Triola M.F. Pearso. CULTURAL NOTE From: Wikipedia. A fluid ouce abbreviated fl oz, fl. oz. or oz. fl., old forms, fl, f, ƒ is a uit of volume also called capacity typically used for measurig liquids. It is equivalet to approximately 30 millilitres. Whilst various defiitios have bee used throughout history, two remai i commo use: the imperial ad the Uited States customary fluid ouce. A imperial fluid ouce is 0 of a imperial pit, 60 of a imperial gallo or approximately 8. ml. A US fluid ouce is 6 of a US fluid pit, 8 of a US fluid gallo or approximately 9.6 ml. The fluid ouce is distict from the ouce, a uit of mass; however, it is sometimes referred to simply as a "ouce" where cotext makes the meaig clear. Discussio: There is oe ormal populatio with kow stadard deviatio. I both sectios, the aswer ca be foud by usig the expressio of the margi of error. Idetificatio of the variable: Amout of milk cosumed daily by a adult ~ Nμ, σ8.7oz Sample iformatio: Theoretical simple radom sample:,..., s.r.s. the amout is measured for adults Formula for the margi of error: We eed the expressio of the margi of error. If we do ot remember it, we ca apply the method of the pivot to take the expressio from the formula of the iterval. 88 Solved Exercises ad Problems of Statistical Iferece

[ ] r α / σ, rα/ σ I α If we remembered the expressio, we ca directly use it. Either way, the margi of error for oe ormal populatio with kow variace is: Er α / σ a Sample size Method based o the cofidece iterval: The equatio ivolves four quatities, ad we ca calculate ay of them oce the others are kow. Here: 8.7 oz 63.08 6 E g Er α / σ E g r α/ σ z α/ σ.96 Eg 0.5 oz sice r α/ r 0.05 /r 0.05.96. The iequality does ot chage either whe multiplyig or dividig by positive quatities or squarig, while it chages whe ivertig. Method based o the Chebyshev's iequality: For ubiased estimators: ^ ^ E P θ E ^ ^ E Var θ α P θ θ θ E ^ σ so Var θvar σ α Eg 8.7 oz α σ 6055. Eg 0.05 0.5 oz 6056 b Margi of error Way : Just by substitutig. 8.7 oz Er α / σ.96 0.353 oz 6 Whe the sample size is doubled, the margi of error is ot reduced by half but by less tha this amout. Way suggested to me by a studet: By maagig the algebraic expressio. ~ σ r σ E 0.5 oz 0.3535 oz E r α/ σ r α / α / ~ Now it is easy to see that if the sample size is multiplied by, the margi of error is divided by. Besides, more geerally: Propositio For the cofidece iterval estimatio of the mea of a ormal populatio with kow variace, based o the method of the pivot, whe the sample size is multiplied by ay scalar c the margi of error is divided by c. Notice that 0.5 is slightly smaller tha the real margi of error after roudig upward; that is why there is a small differet betwee the results of both ways. Coclusio: At least 6 or 6056 data are ecessary to guaratee that the margi of error is equal to 0.50 this margi ca be thought of as the maximum error i probability, i the sese that the distace or error 89 Solved Exercises ad Problems of Statistical Iferece

will be smaller that Eg with a probability of α 0.95, but larger with a probability of α 0.05. θ θ Whe the sample size is multiplied by c, the margi of error is divided by c. Usig more data would also guaratee the precisio desired. The miimum sample sizes provided by the two methods are quite differet see remark ci. Remember: statistical results deped o: the assumptios, the methods, the certaity ad the data. My otes: Exercise ci A compay makes two products, A ad B, that ca be cosidered idepedet ad whose demads follow the distributios NμA, σa70u ad NμB, σb60u, respectively. After aalysig 500 shops, the two simple radom samples yield a 56 ad b 8. a Build 95 ad 98 percet cofidece itervals for the differece betwee the populatio meas. b What are the margi of errors? If sales are measured i the uit u umber of boxes, what is the uit of measure of the margi of error? c A margi of error equal to 0 is desired, how may shops are ecessary? Apply both the method based o the cofidece iterval ad the method based o the Chebyshev's iequality. d If oly product A is cosidered, as if product B had ot bee aalysed, how may shops are ecessary to guaratee a margi of error equal to 0? Agai, apply the two methods. LINGUISTIC NOTE From: Logma Dictioary of Commo Errors. Turto, N.D., ad J.B.Heato. Logma. compay. a orgaizatio that makes or sells goods or that sells services: 'My father works for a isurace compay.' 'IBM is oe of the biggest compaies i the electroics idustry.' factory. a place where goods such as furiture, carpets, curtais, clothes, plates, toys, bicycles, sports equipmet, driks ad packaged food are produced: 'The compay's UK factory produces 500 golf trolleys a week.' idustry. all the people, factories, compaies etc ivolved i a major area of productio: 'the steel idustry', 'the clothig idustry' all idustries cosidered together as a sigle thig: 'Idustry has developed rapidly over the years at the expese of agriculture.' mill. a place where a particular type of material is made: 'a cotto mill', 'a textile mill', 'a steel mill', 'a paper mill' a place where flour is made from grai: 'a flour mill' plat. a factory or buildig where vehicles, egies, weapos, heavy machiery, drugs or idustrial chemicals are produced, where chemical processes are carried out, or where power is geerated: 'Vauxhall-Opel's UK car plats', 'Hoda's ew egie plat at Micrococord. Swido', 'a sewage plat', 'a wood treatmet plat', 'ICI's 00m plat', 'the Sellafield uclear reprocessig plat i Cumbria' works. a idustrial buildig where materials such as cemet, steel, ad bricks are produced, or where idustrial processes are carried out: 'The drop i car ad va sales has led to redudacies i the coutry's steel works.' Discussio: It should statistically be proved the suppositio that the ormal distributio is appropriate to model both variables. The idepedece of the two populatios should be tested as well. The method of the pivot will be applied. After obtaiig the theoretical expressio of the iterval, it is possible to argue about the relatio cofidece-legth. Give the legth of the iterval, the expressio allows us to calculate the miimum umber of data ecessary. The umber of uits demaed ca be see as dimesioless quatities. A approximatio is implicitly beig used i this exercise, sice the umber of uits demaded is a discrete variable while the ormal distributio is cotiuous. a Cofidece iterval The variables are 90 Solved Exercises ad Problems of Statistical Iferece

A Number of uits of product A sold i oe shop A ~ NμA, σa70u B Number of uits of product B sold i oe shop B ~ NμB, σb60u a Pivot: We kow that There are two idepedet ormal populatios We are iterested i μa μb Variaces are kow The, from a table of statistics e.g. i [T], we select T A, B ; μ A,μ B μ A μ B A B A B N 0, σ σ A B a Evet rewritig αp l α/ T A, B ;μ A μ B r α / P r α/ A B μ A μ B A B σ σ A B r α / σ A σ B σ A σb μ A μ B r α / A B A B A B P r α / r α/ P A B P A B r α/ σa σb σa σ B r α / μ A μ B A B A B A B ] σ A σb σa σ B μ A μ B A B r α/ A B A B a3 The iterval [ I α A B r α / σ A σb σ A σb r α/, A B A B A B Substitutio: The quatities i the formula are a 56 u σ A 70 u A 500 ad ad ad b 8 u σ B 60 u B 500 At 95%, α 0.95 α 0.05 α/ 0.05 r α/ r 0.05l 0.975.96 At 98%, α 0.98 α 0.0 α/ 0.0 r α/ r 0.0l 0.99.36 Thus, at 95% [ I 0.95 56 8.96 ad at 98% 9 ] 70 60 70 60, 56 8.96 [9.9, 36.08] 500 500 500 500 Solved Exercises ad Problems of Statistical Iferece

[ I 0.98 56 8.36 ] 70 60 70 60, 56 8.36 [8., 37.59] 500 500 500 500 b Margi of error: Regardig the uits, they ca be treated as ay other algebraic letter represetig a umerical quatity. The quatile ad the sample sizes are dimesioless, while the variaces are expressed i the uit u because of the square i the defiitio σ E[ E] whe data are measured i the uit u. At 95% E 0.95r α / ad at 98% E 0.98r α/ σa σ B 70 u 60 u 70 60.96.96 u 8.08u A B 500 500 500 500 σ A σ B 70 u 60 u 70 60.36.36 u9.59 u A B 500 500 500 500 c Miimum sample sizes Method based o the cofidece iterval: Sice here both samples sizes are equal to the umber of shops, E g Er α/ σ A σ B g E r α/ σ A σb r α / σa σ B σ A σ B r α/ r α/ Eg Eg E g ad hece at 95% ad 98%, respectively,.96 70 u 60 u 36.5 37 0 u.36 ad 70 u 60 u 59.87 60 0 u Method based o the Chebyshev's iequality: For ubiased estimators: ^ ^ E P θ E ^ ^ E Var θ α P θ θ θ E A B σ A σb σ A σ B σ A σb σ A σ B α α α Eg Eg E g E g α Eg σ σ ^ A Var B If Var θvar so 70 u60 u 700 0.05 0 u ad 70 u 60 u 50 0.0 0 u d Miimum sample size A Method based o the cofidece iterval: I this case, whe the method of the pivotal quatity is applied we do ot repeat the calculatios here, the iterval ad the margi of error are, respectively, [ I α A r α/ ] σa σa, A r α/ A A ad Er α/ σa A Note that this case ca be thought of as a particular case where the secod populatio has values B 0, μb0 ad σb0. The, σ A σa σa E g Er α/ E g r α/ A r α / A A Eg 9 Solved Exercises ad Problems of Statistical Iferece

ad hece at 95% ad 98%, respectively, A.96 70 u 88. A 89 0 u ad A.36 70 u 65.0 A 66 0 u Method based o the Chebyshev's iequality: For ubiased estimators: ^ ^ E P θ E ^ ^ E Var θ α P θ θ θ E ^ A If Var θvar σ A A σ A A E g σa A E g α σa A α E g so A 70 980 0.05 0 ad A 70 u 50 0.0 0 u Coclusio: As expected, whe the probability of the tails α decreases the margi of error ad hece the legth icreases. For either oe or two products ad give the margi of error, the more cofidece less sigificace we wat the more data we eed. Sice 500 shops were really cosidered to attai this margi of error, there has bee a waste of time ad moey fewer shops would have sufficed for the desired accuracy 95% or 98%. Whe two idepedet quatities are added or subtracted, the error or ucertaity of the result ca be as large as the total of the two idividual errors or ucertaities; this also holds for radom quatities if they are depedet, a correctio term covariace appears; for this reaso, to guaratee the same margi of error, more data are ecessary i each of the two samples otice that for two populatios the miimum value is larger tha or equal to the sum of the miimum values that would be ecessary for each populatio idividually for the same precisio ad cofidece. The miimum sample sizes provided by the two methods are quite differet see remark ci. Remember: statistical results deped o: the assumptios, the methods, the certaity ad the data. My otes: 93 Solved Exercises ad Problems of Statistical Iferece

Hypothesis Tests Remark ht: Like cofidece, the cocept of sigificace ca be iterpreted as a probability so they are, although we sometimes use a 0-to-00 scale. See remark pt, i the appedix of Probability Theory, o the iterpretatio of the cocept of probability. Remark ht: The quatities α, p-value, β, β ad φ are probabilities, so their values must be betwee 0 ad. Remark 3ht: For two-tailed tests, sice there is a ifiite umber of pairs of quatiles such that P a T 0 a α, those that determie tails of probability α/ are cosidered by covetio. This criterio is also applied for cofidece itervals. Remark ht: To apply the secod methodology, bidig the p-value is sometimes eough to compare it with α. To do that, the proper closest value icluded i the table is used. Remark 5ht: I calculatig the p-value for two-tailed tests, by covetio the probability of the tail determied by T0x,y is doubled. Whe T0, follows a asymmetric distributio, it is difficult to idetify the tail if the value of T0x,y is close to the media. I fact, kowig the media is ot ecessary, sice if we select the wrog tail, twice its probability will be greater tha ad we will realize that the other tail must have bee cosidered. Alteratively, it is always possible to calculate the two probabilities o the left ad o the right ad double the miimum of them this is useful i writig code for software programs. Remark 6ht: Whe more tha oe test ca be applied to make a decisio about the same hypotheses, the most powerful should be cosidered if it exists. Remark 7ht: After makig a decisio, it is possible to evaluate the stregh with which it was made: for the first methodology, by comparig the distace from the statistic to the critical values or, better, the area betwee this set of values ad the desity fuctio of T0 ad, for the secod methodology, by lookig at the magitude of the p-value. Remark 8ht: For small sample sizes, or 3, the critical regio obtaied by applyig ay methodology ca be plotted i the two- or threedimesioal space. [HT] Parametric Remark 9ht: There are four types of pair of hypotheses: simple versus simple simple versus oe-sided composite 3 oe-sided composite versus oe-sided composite simple versus two-sided composite We will directly apply Neyma-Pearso's lemma for the first case. Whe the solutio of the first case does ot deped upo ay particular value of the parameter θ uder H, the same test will be uiformly most powerful for the secod case. I additio, whe there is a uiformly most powerful test for the secod case, it will also be uiformly most powerful for the third case. Remark 0ht: Give H0 ad α, differet decisios ca be made for oe- ad two-tailed tests. That is why: i describig the details of the framework is of great importat i Statistics; ad ii as a geeral rule, all trustworthy iformatio must be used, which implies that a oe-sided test should be used whe there is iformatio that strogly suggests so compare the estimate calculated from the sample with the hypothesized values. α θ P Reject H 0 θ Θ0 ad βθ P Reject H 0 θ Θ, so to plot the power fuctio ϕθ P Reject H 0 θ Θ0 Θ it is usually eough to eter θ Θ0 i the aalytical expressio of βθ. This is the method that we have used i some exercises where the computer has bee used. Remark ht: For parametric tests, Remark ht: A reasoable testig process should verify that βθ P T 0 Rc θ Θ > P T 0 Rc θ Θ0 αθ 0 with βθ αθ0 whe θ θ0. This ca be oticed i the power fuctios plotted i some exercises, where there is a local miimum at θ0. Remark 3ht: Sice oe-sided tests are, i its rage of parameter values, more powerful tha the correspodig two-sided test, the best way of testig a equality cosists i acceptig it whe it is compared with the two types of iequality. Similarly, the best way 9 Solved Exercises ad Problems of Statistical Iferece

to test a iequality cosists i acceptig it whe it is allocated either i the ull hypothesis or i the alterative hypothesis. This ideas, amog others, are rigurously explaied i the materials of professor Alfoso Novales Cica. [HT-p] Based o T Exercise ht-t The lifetime of a machie measured i years, y follows a ormal distributio with variace equal to y. A simple radom sample of size 00 yields a sample mea equal to.3y. Test the ull hypothesis that the populatio mea is equal to.5y, by applyig a two-tailed test with 5 percet sigificace level. What is the type I error? Calculate the type II error whe the populatio mea is y. Fid the geeral expressio of the type II error ad the use a computer to plot the power fuctio. Discussio: First of all, the suppositio that the ormal distributio reasoably explais the lifetime of the machie should be evaluated by usig proper statistical techiques. Nevertheless, the purpose of this exercise is basically to apply the decisio-makig methodologies. Statistic: Sice There is oe ormal populatio The populatio variace is kow the statistic T ; μ μ σ N 0, is selected from a table of statistics e.g. i [T]. Two particular cases of T will be used: μ μ 0 ad T 0 N 0, T N 0, σ σ To apply ay of the two methodologies, the value of T0 at the specific sample x x,...,x00 is ecessary: T 0 x x μ 0 σ.3.5 0. 0 00 Hypotheses: The two-tailed test is determied by H 0 : μ μ 0.5 ad H : μ μ.5 For these hypotheses 95 Solved Exercises ad Problems of Statistical Iferece

Decisio: To make the fial decisio about the hypotheses, two mai methodologies are available. To apply the first oe, the critical values a ad a that determie the rejectio regio are foud by applyig the defiitio of type I error, with α 0.05 at μ0.5, ad the criterio of leavig half the probability i each tail: α.5 P Type I error P Reject H 0 H 0 true P T ; μ Rc H 0 P {T 0 <a } {T 0 > a } { α.5 PT 0 < a al α /.96 α.5 P T 0 >a ar α/.96 Rc {T 0 <.96 } {T 0 >.96 }{ T 0 >.96 } The decisio is: T 0 x T 0 x Rc H0 is ot rejected. The secod methodology is based o the calculatio of the p-value: pv P more rejectig tha x H 0 truep T 0 > T 0 x P T 0 > P T 0 < 0.5870.3 pv 0.3> 0.05α H0 is ot rejected. Type II error: To calculate β, we have to work uder H, that is, with T. Noetheless, the critical regio is expressed i terms of T0. Thus, the mathematical trick of addig ad subtractig the same quatity is applied: βμ PType II error P Accept H 0 H true P T 0 Rc H P T 0.96 H μ 0 P.96 T 0.96 H P.96 P.96 μ μ μ 0 σ.96 H.96 H P.96 μ μ 0 σ σ μ μ μ μ P T.96 0 P T <.96 0 σ σ T.96 μ μ0 σ For the particular value μ, > porm-0.5,0,-porm-.6,0, [] 0.959 β P T 0.5 P T <.6 0.9 By usig a computer, may more values μ ca be cosidered so as to umerically determie the power curve βμ of the test ad to plot the power fuctio. ϕμ P Reject H 0 { α μ if μ Θ0 βμ if μ Θ # Populatio variace # Sample ad iferece 00 alpha 0.05 theta0.5 # Value uder the ull hypothesis H0 q qorm-alpha/,0, 96 Solved Exercises ad Problems of Statistical Iferece

theta seqfrom0,to3,0.0 paramspace sortuiquectheta,theta0 PowerFuctio - pormq-paramspace-theta0/sqrtvariace/,0, porm-q-paramspace-theta0/sqrtvariace/,0, plotparamspace, PowerFuctio, xlab'theta', ylab'probability of rejectig theta0', mai'power Fuctio', type'l' Coclusio: The hypothesis that.5y is the mea of the distributio of the lifetime is ot rejected. As expected, whe the true value is supposed to be, far from.5, the probability of rejectig.5 is β 0.7, that is, high. This value has bee calculated by had; additioally, after fidig the aalytical expressio of the curve β, also by had, the computer allows the power fuctio to be plotted. This theoretical curve, ot depedig o the sample iformatio, is symmetric with respect to μ0.5. Remember: statistical results deped o: the assumptios, the methods, the certaity ad the data. My otes: Exercise ht-t A compay produces electric devices operated by a thermostatic cotrol. The stadard deviatio of the temperature at which these cotrols actually operate should ot exceed.0ºf. For a simple radom sample of 0 of these cotrols, the sample quasi-stadard deviatio of operatig temperatures was.39ºf. Statig ay assumptios you eed write them, test at the 5% level the ull hypothesis that the populatio stadard deviatio is ot larger tha.0ºf agaist the alterative that it is. Apply the two methodologies ad calculate the type II error at σ.5ºf. Use a computer to plot the power fuctio. O the other had, betwee the two alterative hypothesis H : σσ > or H : σσ, which oe would you have selected? Why? Hit: Be careful to use S ad σ wherever you work with a variace istead of a stadard deviatio. Based o a exercise of Statistics for Busiess ad Ecoomics. Newbold, P., W.L. Carlso ad B.M. Thore. Pearso. LINGUISTIC NOTE From: Logma Dictioary of Commo Errors. Turto, N.D., ad J.B. Heato. Logma. actual real as opposed what is believed, plaed or expected: 'People thik he is over fifty but his actual age is forty-eight.' 'Although buses are supposed to ru every fiftee miutes, the actual waitig time ca be up to a hour.' preset/curret happeig or existig ow: 'No oe ca drive that car i its preset coditio.' 'Her curret boyfried works for Shell.' LINGUISTIC NOTE From: Commo Errors i Eglish Usage. Brias, P. William, James & Co. Device is a ou. A ca-opeer is a device. Devise is a verb. ou ca devise a pla for opeig a ca with a sharp rock istead. Oly i law is devise properly used as a ou, meaig somethig deeded i a will. 97 Solved Exercises ad Problems of Statistical Iferece

Discussio: Because of the mathematical theorems available, we are able to study the variace oly for ormally distributed radom variables. Thus, we eed the suppositio that the temperature follows a ormal distributio. I practice, this ormality should be evaluated. Statistic: We kow that There is oe ormal populatio The populatio mea is ukow ad hece the followig dimesioless statistic, ivolvig the sample quasivariace, is chose T ; σ S χ σ We will work with the two followig particular cases: S T 0 χ σ0 ad S T χ σ To make the decisio, we eed to evaluate the statistic T0 at the specific data available x: T 0 x 0.39 F 7.3 F Hypothesis test Hypotheses: H 0 : σ σ 0 ad H : σ σ > The, Decisio: To determie the rejectio regio, uder H0, the critical value a is foud by applyig the defiitio of type I error, with α 0.05 at σ0 ºF : α P Type I error P Reject H 0 H 0 true P T ;θ Rc H 0 P T 0 >a ar αr 0.0530. Rc {T 0 >30. } To make the fial decisio: T 0 x7.3 < 30. T 0 x Rc H0 is ot rejected. The secod methodology requires the calculatio of the p-value: pv P more rejectig tha x H 0 truep T 0 > T 0 xp T 0 >7.30.0 pv 0.0> 0.05α H0 is ot rejected. > - pchisq7.3, 0- [] 0.0663 Type II error: To calculate β, we have to work uder H, that is, with T. Sice the critical regio is already expressed i terms of T0, the mathematical trick of multiplyig ad dividig by same quatity is applied: βσ P Type II error P Accept H 0 H true P T 0 Rc H P T 0 30. H 98 Solved Exercises ad Problems of Statistical Iferece

P 30. σ0 s s σ 30. H P 30. H P T σ 0 σ σ 0 σ For the particular value σ.5ºf, β.5 P T 30. P T 6.79 0.89.5 > pchisq6.79, 0- [] 0.8903596 By usig a computer, may other values σ.5ºf ca be cosidered so as to umerically determie the power curve βσ of the test ad to plot the power fuctio. ϕσ P Reject H 0 { α σ if σ Θ0 βσ if σ Θ # Sample ad iferece 0 alpha 0.05 theta0 # Value uder the ull hypothesis H0 q qchisq-alpha,- theta seqfrom,to5,0.0 paramspace sortuiquectheta,theta0 PowerFuctio - pchisqq*theta0/paramspace, - plotparamspace, PowerFuctio, xlab'theta', ylab'probability of rejectig theta0', mai'power Fuctio', type'l' Coclusio: The ull hypothesis H 0 : σσ 0 is ot rejected. Whe ay of these factors is differet, the decisio might be the opposite. As regards the most appropriate alterative hypothesis, the value of S suggests that the test with σ > is more powerful tha the test with σ the test with σ < agaist the equality would be the least powerful as both the methodologies H0 is the default hypothesis ad the data ted to help H0. Remember: statistical results deped o: the assumptios, the methods, the certaity ad the data. My otes: Exercise 3ht-T Let,..., be a simple radom sample with 5 data take from a ormal populatio variable. The sample iformatio is summarized i 99 Solved Exercises ad Problems of Statistical Iferece

5 j x j05 ad 5 j x j579. a Should the hypothesis H0: σ be rejected whe H: σ > ad α 0.05? Calculate β5. b Ad whe H: σ ad α 0.05? Calculate β5. Use a computer to plot the power fuctio. Discussio: The suppositio that the ormal distributio is appropriate to model should be statistically proved. This statemet is theoretical. Statistic: We kow that There is oe ormal populatio The populatio mea is ukow ad hece the followig statistic is selected T ; σ s χ σ We will work with the two followig particular cases: T 0 s χ σ0 ad T s χ σ To make the decisio, we eed to evaluate the statistic at the specific data available x: 5 T 0 x [ x j xk 5 5 ] 5 5.53 3.56 where to calculate the sample variace, the geeral property s j j j j has bee used. a Oe-tailed alterative hypothesis Hypotheses: H 0 : σ σ 0 ad H : σ σ > For these hypotheses, Decisio: To determie the rejectio regio, uder H0, the critical value a is foud by applyig the defiitio of type I error, with α 0.05 at σ0 : α P Type I error P Reject H 0 H 0 true P T ;θ Rc H 0 P T 0 >a 00 Solved Exercises ad Problems of Statistical Iferece

ar αr 0.0536. Rc {T 0 >36. } To make the fial decisio: T 0 x3.56 < 36. T 0 x Rc H0 is ot rejected. The secod methodology requires the calculatio of the p-value: pv P more rejectig tha x H 0 truep T 0 >T 0 xp T 0 >3.560.075 > - pchisq3.56, 5- [] 0.0759706 pv 0.075> 0.05α H0 is ot rejected. Type II error: To calculate β, we have to work uder H, that is, with T. Sice the critical regio is expressed i terms of T0, the mathematical trick of multiplyig ad dividig by same quatity is applied: βσ P Type II error P Accept H 0 H true P T 0 R c H P T 0 36. H 36. σ0 s s σ P 36. H P 36. H P T σ 0 σ σ 0 σ For the particular value σ 5, β5 P T 36. P T 9. 0.78 5 > pchisq9., 5- [] 0.78357 By usig a computer, may other values σ 5 ca be cosidered so as to umerically determie the power curve βσ of the test ad to plot the power fuctio. ϕσ P Reject H 0 { α σ if σ Θ0 βσ if σ Θ # Sample ad iferece 5 alpha 0.05 theta0 # Value uder the ull hypothesis H0 q qchisq-alpha,- theta seqfrom,to5,0.0 paramspace sortuiquectheta,theta0 PowerFuctio - pchisqq*theta0/paramspace, - plotparamspace, PowerFuctio, xlab'theta', ylab'probability of rejectig theta0', mai'power Fuctio', type'l' b Two-tailed alterative hypothesis Hypotheses: H 0 : σ σ 0 ad 0 H : σ σ Solved Exercises ad Problems of Statistical Iferece

For these hypotheses, Decisio: Now there are two tails, determied by two critical values a ad a that are foud by applyig the defiitio of type I error, with α 0.05 at σ0, ad the criterio of leavig half the probability i each tail: α PType I error P Reject H 0 H 0 true PT ; θ R c H 0 P T 0 <a P T 0 >a We always cosider two tails with the same probability, { α P T 0 < a ar α/. α P T 0 >a a r α / 39. Rc {T 0 <. } {T 0 > 39. } To make the fial decisio: T 0 x3.56 T 0 x Rc H0 is ot rejected To base the decisio o the p-value, we calculate twice the probability of the tail: pv P more rejectig tha x H 0 true P T 0 > T 0 x P T 0 >3.56 0.0750.5 > - pchisq3.56, 5- [] 0.0759706 pv 0.5> 0.05α H0 is ot rejected Note: The wrog tail would have bee selected if we had obtaied a p-value bigger tha. Type II error: To calculate β, βσ P Type II error P Accept H 0 H true P T ; θ Rc H [ P {T 0 <. } {T 0 >39. } H P [ P ] s s <. H P >39. H σ 0 σ0 ] s. σ 0 s 39. σ0 < H P H σ σ σ σ 39. σ0. σ0 s. σ 0 s 39. σ0 P < H P H P T P T < σ σ σ σ σ σ For the particular value σ 5, β5 P T 3.5 P T < 9.9 0.86 0.005 0.85 > pchisqc9.9, 3.5, 5- [] 0.00533 0.860656 Agai, the computer allows the power fuctio to be plotted. # Sample ad iferece 5 alpha 0.05 theta0 # Value uder the ull hypothesis H0 q qchisqcalpha/,-alpha/,5-0 Solved Exercises ad Problems of Statistical Iferece

theta seqfrom0,to5,0.0 paramspace sortuiquectheta,theta0 PowerFuctio - pchisqq[]*theta0/paramspace, - pchisqq[]*theta0/paramspace, - plotparamspace, PowerFuctio, xlab'theta', ylab'probability of rejectig theta0', mai'power Fuctio', type'l' Compariso of the power fuctios: For the oe-tailed test, the power of the test at σ 5 is β5 0.78 0., while for the two-tailed test it is β5 0.85 0.5. As expected, this latter test has smaller power higher type II error, sice i the former test additioal iformatio is beig used whe oe tail is previously discarded. Now we compare the power fuctios of the two tests graphically, for the commo values >, by usig the code # Sample ad iferece 5 alpha 0.05 theta0 # Value uder the ull hypothesis H0 q qchisqcalpha/,-alpha/,5- theta seqfrom0,to5,0.0 paramspace sortuiquectheta,theta0 PowerFuctio - pchisqq[]*theta0/paramspace, - pchisqq[]*theta0/paramspace, - q qchisq-alpha,- theta seqfrom,to5,0.0 paramspace sortuiquectheta,theta0 PowerFuctio - pchisqq*theta0/paramspace, - plotparamspace, PowerFuctio, xlimc0,5, xlab'theta', ylab'probability of rejectig theta0', mai'power Fuctio', type'l' liesparamspace, PowerFuctio, lty It ca be oticed that the curve of the oe-sided test is over the curve of the two-sided test for ay σ >, 03 Solved Exercises ad Problems of Statistical Iferece

which makes it uiformly more powerful. I this exercise, from the sample iformatio we could have calculated the estimator S of σ so as to see if its value is far from ad therefore oe of the two oe-sided tests should be cosidered better. Coclusio: The hypothesis that the populatio variace is equal to is ot rejected i either of the two sectios. Although it has ot happeed i this case, differet decisios may be made for the oe- ad two-tailed cases. Remember: statistical results deped o: the assumptios, the methods, the certaity ad the data. My otes: Exercise ht-t Imagie that you are hired as a cook. Not a ordiary oe but a statistical cook. For a ormal populatio, i testig the two hypotheses H 0 : σ σ0 H : σ σ > { the data sample x of size such that S7.6u ad the sigificace α0.05 have led to rejectig the ull hypothesis because α r 0.058.3 T 0 x9 where T0 is the usual statistic. A decisio depeds o several factors: Methodology Statistic T0 Form of the alterative hypothesis H Sigificace α Data x edu.glogster.com/ Sice the chef your boss wats the ull hypothesis H0 ot to be rejected, fid three differet ways to scietifically make the opposite decisio by chagig ay of the previous factors. Give qualitative explaatios ad, if possible, quatitative oes. Discussio: Metaphorically, Statistics ca be thought of as the kitche with its utesils ad appliaces, the first two factors as the recipe, ad the ext three items as the igrediets if H, α or x are iappropriate, there is little to do ad it does ot matter how good the kitche, the recipe ad you are. Our statistical kowledge allows us to chage oly the last three elemets. The statistic to study the variace of a ormal populatio is T S χ σ 0 so, uder H0, T 0 x S 7.6 u 76 9. σ 0 u Solved Exercises ad Problems of Statistical Iferece

Qualitative reasoig: By lookig at the figure above, we cosider that: A If a two-tailed test is cosidered H: σ σ, the critical value would be r α / istead of r α ad, the, the evaluatio T 0 x may ot lie i the rejectio regio tails. B Equivaletly, for the origial oe-tailed test, the critical value r α icreases whe the sigificace α decreases, perhaps with the same implicatio as i the previous item. C Fially, for the same oe-sided alterative hypothesis ad sigificace, that is, for the same critical value r α, the evaluatio T 0 x would lie out ot the critical regio tail if the data x the values themselves or oly the sample size are such that T 0 x < r α8.3. D Additioally, a fourth way could cosist of some combiatios of the previous ways. Quatitative reasoig: The previous qualitative explaatios ca be supported with calculatios. A For the two-tailed test, ow the critical value would be r 0.05 /r 0.050.8. The T 0 x9 < 0.8r 0.05 T 0 x Rc H0 is ot rejected. B The same effect is obtaied if, for the origial oe-tailed H, the sigificace is take to be 0.05 istead of 0.05. Ay other value smaller tha 0.05 would lead to the same result. Is 0.05 suggested by the previous item the smallest possible value? The aswer is made by usig the p-value, sice it is sometimes defied as the smallest sigificace level at which the ull hypothesis is rejected. The, sice pv P more rejectig tha x H 0 truep T 0 > 90.003 > - pchisq9, - [] 0.00668 for ay α < 0.003 it would hold that 0.003 pv > α H0 is ot rejected C Fially, for the origial test ad the same value for, sice ~ ~ S ~ S S S ~ T 0 x 9 < 8.3r α σ0 S σ0 7.6 u the opposite decisio would be made for ay sample quasivariace such that 7.6 u ~ S < 8.3 7.3 u T 0 x Rc H0 is ot rejected 9 O the other had, for the origial test ad the same value for S, sice ~ S ~ S ~ T~0 x 9 < 8.3r α σ0 σ0 the opposite decisio would be made for ay sample size such that ~ 0 T 0 x Rc H0 is ot rejected < 8.3 0.6368 ~ 9 D Some combiatios ca easily be proved to lead to rejectig H0. Coclusio: This exercise highlights how much careful oe must be i either writig or readig statistical works. My otes: 05 Solved Exercises ad Problems of Statistical Iferece

Exercise 5ht-T The distributio of a variable is supposed to be ormally distributed i two idepedet biological populatios. The two populatio variaces must be compared. After gatherig iformatio through simple radom samples of sizes, 0, respectively, we are give the value of the estimators S x j x 6.8 j s y j y 7. j For α 0., test: a H0: σ σ agaist H: σ < σ b H0: σ σ agaist H: σ > σ c H0: σ σ agaist H: σ σ I each sectio, calculate the aalytical expressio of the type II error ad plot the power fuctio by usig a computer. Discussio: I a real-world situatio, suppositios should be proved. We must pay careful attetio to the details: the sample quasivariace is provided for oe group, while the sample variace is give for the other. Statistic: From the iformatio i the statemet, There are two idepedet ormal populatios The populatio meas are ukow the statistic T, ; σ, σ S σ S S σ S σ F, σ is selected from a table of statistics e.g. i [T]. It will be used i two forms we ca write σ/ σ θ: T 0, S σ S S S F, ad σ T, S θ σ S S F θ S, σ O the other had, the pooled sample variace Sp should ot be cosidered eve uder H0: σ σ σ, as T 0 S p /S p whatever the data are. To apply ay of the two methodologies we eed to evaluate T0 at the samples x ad y: T 0 x, y S S 6.8 0.86 0 S 7. s 0 Sice we were give the sample quasivariace of populatio, but the sample variace of populatio, the geeral property s S has bee used to calculate S. 06 Solved Exercises ad Problems of Statistical Iferece

a Oe-tailed alterative hypothesis σ < σ H 0 : σ σ Hypotheses: ad σ Or, equivaletly, H 0 : θ0 σ H : σ < σ ad σ H : θ < σ For these hypotheses, Decisio: To determie the critical regio, uder H0, the critical value a is foud by applyig the defiitio of type I error, with α 0. at θ0 : α P Type I error P Reject H 0 H 0 true P T, <a H 0 P T 0, < a 0. PT 0, < a P ar α > T 0, a.35 a From the defiitio of the F distributio, it is easy to see that if follows a Fk,k the / follows a Fk,k. We use this property to cosult our table. 0.3 Rc {T 0, < 0.3}.35 To make the fial decisio about the hypotheses: T 0 x, y 0.86 T 0 x Rc H0 is ot rejected. The secod methodology requires the calculatio of the p-value: pv P, more rejectig tha x, y H 0 true P T 0, <T 0 x, yp T 0, < 0.860. > pf0.86, -, 0- [] 0.06005 pv 0.> 0.α H0 is ot rejected. Power fuctio: To calculate β, we have to work uder H, that is, with T. Sice i this case the critical regio is already expressed i terms of T0, the mathematical trick of multiplyig ad dividig by the same quatity is applied: βθ P Type II error P Accept H 0 H true P T 0 Rc H P T 0 0.3 H P S S 0.3 0.3 0.3 H P θ θ 0.3 H P T θ P T < θ S S By usig a computer, may values θ ca be cosidered so as to determie the power curve βθ of the test ad to plot the power fuctio. ϕθ P Reject H 0 α θ if θ Θ0 βθ if θ Θ { # Sample ad iferece x ; y 0 alpha 0. theta0 q qfalpha,x-,y- theta seqfrom0,to,0.0 paramspace sortuiquectheta,theta0 PowerFuctio pfq/paramspace, x-, y- plotparamspace, PowerFuctio, xlab'theta', ylab'probability of rejectig theta0', mai'power Fuctio', type'l' 07 Solved Exercises ad Problems of Statistical Iferece

b Oe-tailed alterative hypothesis σ > σ H 0 : σ σ Hypotheses: ad H : σ > σ σ Or, equivaletly, H 0 : θ0 σ ad σ H : θ > σ For these hypotheses, Decisio: To apply the methodology based o the rejectio regio, the critical value a is foud by applyig the defiitio of type I error, with α 0. at θ0 : α P Type I error P Reject H 0 H 0 true P T, >a H 0 P T 0, > a ar α. Rc {T 0, >. } The fial decisio is: T 0 x, y 0.86 T 0 x Rc H0 is ot rejected. The secod methodology requires the calculatio of the p-value: pv P, more rejectig tha x, y H 0 true PT 0, >T 0 x, y P T 0, > 0.86 0.0.59 pv 0.59> 0.α H0 is ot rejected. > pf0.86, -, 0- [] 0.06005 Power fuctio: Now βθ P Type II error P Accept H 0 H true PT 0 Rc H P T 0. H P S. H P S S.. H P T θ S θ θ By usig a computer, may values θ ca be cosidered so as to plot the power fuctio. 08 Solved Exercises ad Problems of Statistical Iferece

# Sample ad iferece x ; y 0 alpha 0. theta0 q qf-alpha,x-,y- theta seqfrom,to5,0.0 paramspace sortuiquectheta,theta0 PowerFuctio - pfq/paramspace, x-, y- plotparamspace, PowerFuctio, xlab'theta', ylab'probability of rejectig theta0', mai'power Fuctio', type'l' c Two-tailed alterative hypothesis σ σ Hypotheses: H 0 : σ σ ad H : σ σ σ Or, equivaletly, H 0 : θ0 ad σ σ H : θ σ For these hypotheses, Decisio: For the first methodology, the critical regio must be determied by applyig the defiitio of type I error, with α 0. at θ, ad the criterio of leavig half the probability i each tail: α P Type I error P Reject H 0 H 0 truep T 0, <a P T 0, >a { α PT 0, <a a l α / 0.33 α P T 0, >a ar α / 3. Rc {T 0, <0.33 } {T 0, > 3. } > qfc0.05, 0.95, -, 0- [] 0.330838 3.3780 The decisio depeds o whether the evaluatio of T0 is i the rejectio regio: T 0 x, y 0.86 T 0 x Rc H0 is ot rejected. 09 Solved Exercises ad Problems of Statistical Iferece

To apply the methodology based o the p-value, we calculate the media qf0.5, -, 0.007739; thus, sice Tx,y is i the left-had tail: pv P, more rejectig tha x, y H 0 true P T 0, <T 0 x, y P T 0, <0.86 0.0.8 pv 0.8> 0.α H0 is ot rejected. If you caot calculate the media, try the tail you trust most ad chage it if a value bigger tha is obtaied after doublig the probability. Power fuctio: Now βθ P Type II error P Accept H 0 H true PT 0 Rc H P 0.33 T 0 3. H P 0.33 P S 0.33 S 3. 3. H P θ θ θ H S S 3. 3. 0.33 T P T P T < 0.33 θ θ θ θ By usig a computer, may values θ ca be cosidered i order to plot the power fuctio. # Sample ad iferece x ; y 0 alpha 0. theta0 q qfcalpha/, -alpha/,x-,y- theta seqfrom0,to5,0.0 paramspace sortuiquectheta,theta0 PowerFuctio - pfq[]/paramspace, x-, y- pfq[]/paramspace, x-, y- plotparamspace, PowerFuctio, xlab'theta', ylab'probability of rejectig theta0', mai'power Fuctio', type'l' Compariso of the power fuctios: Now we compare the power fuctios of the three tests graphically, by usig the code # Sample ad iferece x ; y 0 alpha 0. theta0 q qfcalpha/, -alpha/,x-,y- theta seqfrom0,to5,0.0 paramspace sortuiquectheta,theta0 PowerFuctio - pfq[]/paramspace, x-, y- pfq[]/paramspace, x-, y- q qfalpha,x-,y- theta seqfrom0,to,0.0 paramspace sortuiquectheta,theta0 PowerFuctio pfq/paramspace, x-, y- q qf-alpha,x-,y- 0 Solved Exercises ad Problems of Statistical Iferece

theta seqfrom,to5,0.0 paramspace3 sortuiquectheta,theta0 PowerFuctio3 - pfq/paramspace3, x-, y- plotparamspace, PowerFuctio, xlimc0,5, xlab'theta', ylab'probability of rejectig theta0', mai'power Fuctio', type'l' liesparamspace, PowerFuctio, lty liesparamspace3, PowerFuctio3, lty It ca be see that the curves of the oe-sided tests are over the curve of the two-sided test for ay θ i its regio each oe-sided test has more power tha the two-sided test, sice additioal iformatio is used whe oe tail is discarded. The, ay of the two oe-sided tests is uiformly more powerful tha the two-sided test i their respective commo domais. Coclusio: The hypothesis that the populatio variace is equal i the two biological populatios is ot rejected whe tested agaist ay of the three alterative hypotheses. Although it has ot happeed i this case, differet decisios ca be made for the oe- ad two-tailed tests. I this exercise, the empirical value T0x S/S 0.86 suggests the alterative hypothesis H: σ/σ <. Remember: statistical results deped o: the assumptios, the methods, the certaity ad the data. My otes: Exercise 6ht-T Two simple radom samples of 700 citizes of Italy ad Russia yielded, respectively, that 53% of Italia people ad 7% of Russia people wish to visit Spai withi the ext te years. Should we coclude, with cofidece 0.99, the Italias' desire is higher tha the Russias'? Determie the critical regio ad make a decisio. What is the type I error? Calculate the p-value ad apply the methodology based o the p-value to make a decisio. Allocate the questio i the alterative hypothesis. Calculate the type II error for the value 0.. Allocate the questio i the ull hypothesis. Calculate the type II error for the value 0.. Use a computer to plot the power fuctio. Solved Exercises ad Problems of Statistical Iferece

Discussio: After readig the statemet possibly twice, if ecessary, we realize that there are two idepedet populatios whose citizes have bee set a questio with two possible aswers dichotomic situatio. The, each idividual ca be thought of as modeled through a Beroulli variable. I practice, the implicit suppositio that the same parameter η govers the behaviour of all the idividuals should still be evaluated for each populatio a sort of homogeeity to aalyse whether or ot several subpopulatios should be cosidered. The idepedece of the two populatios should be studied as well. Either way, i this exercise we will merely apply the testig methodologies. The sample proportios of those who said 'yes' are give: η^ I 0.53 ad η^ R 0.7, respectively. If ηi ad ηr are the theoretical proportios of the populatios, that is, the quatities we wat to compare, we eed to test the hyphothesis ηi > ηr oe-tailed test. Should this hypothesis be writte as a ull or as a alterative hypothesis? I geeral, sice we fix the type I error i our methodologies, a strog sample evidece is ecessary to reject H0. Thus, the decisio of allocatig the coditio to be tested i H0 or H depeds o our choice usually o what makig a wrog decisio meas or implies for the specific framework we are workig i. We are goig to solve both cases. From a theoretical poit of view, H0: ηi ηr is essetialy the same as H0: ηi ηr. As a fial remark, i this exercise it holds that 0.53 0.7 ; this happes just by chace, sice these two quatities are idepedet ad ca take ay value i [0,]. O the other had, proportios are always dimesioless. Statistic: We kow that There are two idepedet Beroulli populatios The sample sizes are larger tha 30 so we use the asymptotic result ivolvig two proportios: T I, R ^ I η^ R ηi ηr η? I? I? R? R I R d N 0, where each? must be substituted by the best possible iformatio: supposed or estimated. Two particular versios of this statistic will be used: T 0 I, R ^ I η ^ R θ0 η ^ R θ η^ I η d η ^ I η^ I η ^ η ^ R R I R N 0, ad T I, R η ^ I η ^ I η ^ η^ R R I R d N 0, To determie the critical regio or to calculate the p-value, both uder H0, we eed the value of the statistic for the particular samples available: 0.53 0.7 0 T 0 i, r.5 0.53 0.53 0.7 0.7 700 700 Questio i H0 Hypotheses: If we wat to allocate the questio i the ull hypothesis to reject it oly whe the data strogly suggest so, H 0 : ηi ηr θ 0 0 ad H : ηi ηr θ < 0 By lookig at the alterative hypothesis, we deduce the form of the critical regio: Solved Exercises ad Problems of Statistical Iferece

The quatity c ca be thought of as a margi over θ0 ot to exclude cases where ηi ηr θ0 0 really holds while values slightly smaller tha θ0 are due to mere radom effects. Decisio: To apply the first methodology, the critical value a that determies the rejectio regio is foud by applyig the defiitio of type I error, with the value α 0.99 0.0 at θ0 0: α 0 P Type I error P Reject H 0 H 0 true P T I, R R c H 0 P T 0 I, R<a al 0.0.36 Rc {T 0 I, R<.36} The decisio is: T 0 i, r.5 T 0 i, r Rc H0 is ot rejected. As regards the value of the type I error, it is α by defiitio. The secod methodology is based o the calculatio of the p-value: pv P I, R more rejectig tha i, r H 0 true P T 0 I, R < T 0 i, r P T 0 I, R <.50.988 pv 0.988 > 0.0α H0 is ot rejected. Type II error: To calculate β, we have to work uder H. Sice the critical regio is expressed i terms of T0 ad we must use T, we are goig to apply the mathematical trick of addig ad subtractig the same quatity: βθ P Type II error P Accept H 0 H true P T 0 I, R Rc H P P η ^ I η^ R 0 θ ^ I η ^ I η ^ η ^ R η R I R P T I, R.36 η ^ I η ^ R θ0 ^ I η^ I η ^ η ^ R η R I R θ ^ I η ^ I η ^ η^ R η R I R.36 H θ.36 H 0.53 0.53 0.7 0.7 700 700 For the particular value θ 0., β 0. P T I, R.36 0. P T I, R. 0.078 0.53 0.53 0.7 0.7 700 700 By usig a computer, may other values θ 0. ca be cosidered so as to umerically determie the power of the test curve βθ ad to plot the power fuctio. 3 Solved Exercises ad Problems of Statistical Iferece

ϕθ P Reject H 0 { α θ if θ Θ0 βθ if θ Θ # Sample ad iferece i 700; r 700 spi 0.53; spr 0.7 alpha 0.0 theta0 0 # Value uder the ull hypothesis H0 q qormalpha,0, theta seqfrom-0.5,to0,0.0 paramspace sortuiquectheta,theta0 PowerFuctio pormq-paramspace/sqrtspi*-spi/i spr*-spr/r,0, plotparamspace, PowerFuctio, xlab'theta', ylab'probability of rejectig theta0', mai'power Fuctio', type'l' This code geerates the followig figure: Questio i H Hypotheses: If we wat to allocate the questio i the alterative hypothesis to accept it oly whe the data strogly suggest so, H 0 : ηi ηr θ 0 0 ad H : ηi ηr θ > 0 By lookig at the alterative hypothesis, we deduce the form of the critical regio: The quatity c ca be thought of as a margi over θ0 ot to exclude cases where ηi ηr θ0 0 really holds while values slightly larger tha θ0 are due to mere radom effects. Decisio: To apply the first methodology, the critical value a is calculated as follows: α 0 P Type I error P Reject H 0 H 0 true P T I, R R c H 0 P T 0 I, R>a ar 0.0.36 Rc {T 0 I, R>.36 } The decisio is: T 0 i, r.5 T 0 i, r Rc H0 is ot rejected. The secod methodology cosists i doig: Solved Exercises ad Problems of Statistical Iferece

pv P I, R more rejectig tha i, r H 0 true P T 0 I, R > T 0 i, r P T 0 I, R >.5 P T 0 I, R.50.0 pv 0.0 > 0.0α H0 is ot rejected. Type II error: Fially, to calculate β: βθ P Type II error P Accept H 0 H true P T 0 I, R Rc H P P η ^ I η^ R 0 θ ^ I η ^ I η ^ η ^ R η R I R P T I, R.36 η ^ I η ^ R θ0 ^ I η^ I η ^ η ^ R η R I R θ ^ I η ^ I η ^ η^ R η R I R θ.36 H 0.53 0.53 0.7 0.7 700 700.36 H For the particular value θ 0., β0. P T I, R.36 0. P T I, R. 0.078 0.53 0.53 0.7 0.7 700 700 By usig a computer, may more values θ 0. ca be cosidered so as to umerically determie the power of the test curve βθ ad to plot the power fuctio. ϕθ P Reject H 0 { α θ if θ Θ0 βθ if θ Θ # Sample ad iferece i 700; r 700 spi 0.53; spr 0.7 alpha 0.0 theta0 0 # Value uder the ull hypothesis H0 5 Solved Exercises ad Problems of Statistical Iferece

q qorm-alpha,0, theta seqfrom0,to0.5,0.0 paramspace sortuiquectheta,theta0 PowerFuctio - pormq-paramspace/sqrtspi*-spi/i spr*-spr/r,0, plotparamspace, PowerFuctio, xlab'theta', ylab'probability of rejectig theta0', mai'power Fuctio', type'l' This code geerates the figure above. Coclusio: The hypothesis that the two proportios are equal is ot rejected whe the questio is allocated i either the alterative or the ull hypothesis the best way of testig a equality. That is, it seems that both populatios wish to visit Spai with the same desire. The sample iformatio η^ I 0.53 ad η^ R 0.7 suggested the alterative hypothesis H: ηi ηr > 0. The two power fuctios show how symmetric the situatios are. Remember: statistical results deped o: the assumptios, the methods, the certaity ad the data. Advaced theory: Uder the hypothesis H0: ηi η ηr, it makes sese to try to estimate the commo variace η η of the estimator i the deomiator as well as possible. This ca be doe by usig the η^ η^ pooled sample proportio η^ p I I R R. Nevertheless, the pooled estimator should ot be cosidered i I R the umerator, sice η^ p η^ p0 whatever the data are. Now, the statistic uder the ull hypothesis is: T~0 I, R η ^ I η^ R θ0 η ^ p η ^ p η^ p η^ p I R T 0 I, R The, η^ p η^ I η^ R θ 0 η^ I η ^ I η^ R η^ R I R η^ I η ^ I η^ R η^ R I R η^ p η ^ p η ^ η ^ p p I R η ^ p η ^ p η ^ η^ p p I R d N 0, 700 0.53 700 0.7 0.530.7 0.5 700 700 6 η ^ I η^ I η ^ η ^ R R I R η^ I η ^ I η^ R η^ R I R η^ p η^ p η ^ η ^ p p I R Solved Exercises ad Problems of Statistical Iferece 0.998983

~ T 0 i, r.5 0.998983.. The same decisios are made with T 0 ad T~0 because of the little effect of usig η^ p i this exercise see the value of the quotiet of square roots above; i other situatios, both ways may lead to paradoxical results. As regards the calculatios of the type II error, both the mathematical trick of multiplyig ad dividig by the same quatity ad the mathematical trick of addig ad subtractig the same quatity should be applied ow. For sectio a: βθ P Type II error P Accept H 0 H true P T~0 I, R Rc H P P P ^ I η ^ R 0 θ θ η η ^ I η^ I η ^ η ^ R R I R η ^ I η ^ R θ ^ I η ^ I η ^ η ^ R η R I R P T I, R.330 η ^ I η ^ R θ0 ^ p η^ p η^ p η^ p η I R.35 η^ p η ^ p η ^ η ^ p p I R η^ I η ^ I η^ R η^ R I R θ.35 H ^ I η ^ I η ^ η^ R η R I R.35.00 H θ H 0.53 0.53 0.7 0.7 700 700 For the particular value θ 0., β 0. P T I, R.330 0. P T I, R. 0.079. 0.53 0.53 0.7 0.7 700 700 Similarly for sectio b. My otes: [HT-p] Based o Λ Exercise ht-λ A radom quatity follows a Poisso distributio. Let,..., be a simple radom sample. By applyig the results ivolvig Neyma-Pearso's lemma ad the likelihood ratio, study the critical regio estimator that arises ad form for the followig pairs of hypotheses. 7 Solved Exercises ad Problems of Statistical Iferece

{ { H 0: λ λ0 H : λ λ H 0 : λ λ0 H : λ λ > λ0 { H 0 : λ λ0 H : λ λ < λ0 { H 0 : λ λ0 H : λ λ > λ0 { H 0 : λ λ0 H : λ λ < λ 0 Discussio: This is a theoretical exercise where o assumptio should be evaluated. First of all, Neyma-Pearso's lemma will be applied. We expect the maximum-likelihood estimator of the parameter calculated i a previous exercise ad the usual critical regio form to appear. If the critical regio does ot deped o ay particular value θ, the uiformly most powerful test will have bee foud. Poisso distributio: ~ Poisλ For the Poisso distributio, Idetificatio of the variable: { Hypothesis test ~ Poisλ H 0: λ λ0 H : λ λ Likelihood fuctio ad likelihood ratio: L ; λ λ j j j j! e λ L ; λ 0 λ0 Λ ; λ 0, λ L ; λ λ ad j j e λ λ 0 Rejectio regio: Rc { Λ < k } { { j j λ0 λ } } λ λ log < log k λ λ log < log k λ λ λ }{ λ } j j λ 0 λ e <k { j λ j log λ0 λ 0 λ < log k 0 0 0 0 Now it is ecessary that λ λ 0 ad if λ < λ 0 the log if λ > λ 0 the log { { } } logk λ 0 λ λ0 < > 0 ad hece Rc λ λ log 0 λ logk λ 0 λ λ0 > < 0 ad hece Rc λ λ log 0 λ λ ML calculated i a previous exercise ad regios of the form This suggests the estimator Rc {Λ< k } { λ ML <c } {T 0 < a } or Rc {Λ< k } { λ^ ML >c } {T 0 > a } Hypothesis tests { H 0 : λ λ0 H : λ λ > λ 0 { H 0 : λ λ0 H : λ λ < λ 0 I applyig the methodologies, ad give α, the same critical value c or a will be obtaied for ay λ sice it oly depeds upo λ0 through λ^ ML or T0: 8 Solved Exercises ad Problems of Statistical Iferece

or αp Type I error P T 0 <a αp Type I errorp T 0 > a This implies that the uiformly most powerful test has bee foud. { Hypothesis tests { H 0 : λ λ0 H : λ λ > λ 0 H 0 : λ λ0 H : λ λ < λ 0 A uiformly most powerful test for H 0 : λ λ0 is also uiformly most powerful for H 0 : λ λ0. Expoetial distributio: For the expoetial distributio, Idetificatio of the variable: Hypothesis test { ~ Expλ H 0: λ λ0 H : λ λ Likelihood fuctio ad likelihood ratio: λ j L ; λ λ e ad j L ; λ 0 λ0 λ λ Λ ; λ 0, λ e L ; λ λ 0 j j Rejectio regio: Rc { Λ < k } { λ0 λ λ e λ 0 j { j }{ < k log λλ λ λ 0 0 j j < logk } { } } λ λ λ λ 0 j j < logk log λ 0 λ λ 0 < log k log λ 0 Now it is ecessary that λ λ 0 ad { > if λ < λ 0 the λ λ 0 < 0 ad Rc { λ log k log λ 0 }{ < λ λ0 logk log < if λ > λ 0 the λ λ 0 > 0 ad Rc λ0 }{ λ > λ λ0 λ0 λ } λ0 λ } λ λ 0 log k log λ λ 0 logk log λ ML calculated i a previous exercise ad regios of the form Rc {Λ< k } { λ ML <c } {T 0 < a } or Rc {Λ< k } { λ^ ML >c } {T 0 > a } This suggests the estimator Hypothesis tests { H 0 : λ λ0 H : λ λ > λ 0 { H 0 : λ λ0 H : λ λ < λ 0 I applyig the methodologies, ad give α, the same critical value c or a will be obtaied for ay λ sice it 9 Solved Exercises ad Problems of Statistical Iferece

oly depeds upo λ0 through λ^ ML or T0: αp Type I error P T 0 <a or αp Type I errorp T 0 > a This implies that the uiformly most powerful test has bee foud. { Hypothesis tests { H 0 : λ λ0 H : λ λ >λ 0 H 0 : λ λ0 H : λ λ <λ 0 A uiformly most powerful test for H 0 : λ λ0 is also uiformly most powerful for H 0 : λ λ0. Beroulli distributio: For the Beroulli law, Idetificatio of the variable: Hypothesis test { ~ Bη H 0 : η η0 H : η η Likelihood fuctio ad likelihood ratio: j j L ; η η j η ad j L ; η0 η0 Λ ; η0, η L ; η η j j η0 η j j Rejectio regio: { }{ } { [ ] } { η Rc { Λ < k } η0 j j log η0 η j j j <k η0 η0 η0 log η log < log k log η η j η0 η η0 < log k log η η0 η Now it is ecessary that η η0 ad if η < η0 the log if η > η0 the log { { η0 η < > 0 ad Rc η η0 η0 η > <0 ad Rc η η0 log η0 η η η0 log k log log } } η0 η log k log η0 η η0 η η η0 η ML calculated i a previous exercise ad regios of the form This suggests the estimator 0 } η0 η0 log log < logk j j η j j η Solved Exercises ad Problems of Statistical Iferece

Rc {Λ< k } { η ML <c } {T 0 < a } or Rc {Λ< k } { η^ ML >c } {T 0 >a } { Hypothesis tests { H 0 : η η0 H : η η >η0 H 0 : η η0 H : η η<η0 I applyig the methodologies, ad give α, the same critical value c or a will be obtaied for ay η sice it oly depeds upo η0 through η^ ML or T0: αp Type I error P T 0 <a or αp Type I errorp T 0 > a This implies that the uiformly most powerful test has bee foud. { Hypothesis tests { H 0 : η η0 H : η η >η0 H 0 : η η0 H : η η<η0 A uiformly most powerful test for H 0 : η η0 is also uiformly most powerful for H 0 : η η0. Normal distributio: For the ormal distributio, Idetificatio of the variable: Hypothesis test { ~ Nμ,σ H 0 : μ μ0 H : μ μ Likelihood fuctio ad likelihood ratio: L ; μ π σ / μ j j σ e ad [ L ; μ 0 Λ ; μ0,μ e σ L ;μ e μ 0 μ μ 0 j μ j σ j e j j μ 0 j j μ ] e μ 0 μ μ 0 μ j σ [ j ] j μ 0 j μ 0 j μ j μ j σ μ 0 μ e σ j j e μ 0 μ σ Rejectio regio: R { Λ < k } {e μ 0 μ σ c { μ 0 μ j j j e μ 0 μ σ < k } {μ σ μ 0 j j μ 0 μ σ 0 0 Now it is ecessary that μ μ 0 ad if μ <μ 0 { logk σ μ 0 μ < the μ 0 μ >0 ad Rc μ 0 μ < log k } < log k σ μ μ } {μ μ < log k σ μ μ } j } Solved Exercises ad Problems of Statistical Iferece 0

if μ >μ 0 { logk σ μ 0 μ > the μ 0 μ <0 ad Rc μ 0 μ } μ ML calculated i a previous exercise ad regios of the form This suggests the estimator Rc {Λ< k } {μ ML < c } {T 0 <a } or Rc {Λ< k } {μ^ ML > c } {T 0 >a } Hypothesis tests { H 0 : μ μ0 H : μ μ >μ 0 { H 0 : μ μ0 H : μ μ <μ 0 I applyig the methodologies, ad give α, the same critical value c or a will be obtaied for ay μ sice it oly depeds upo μ0 through μ^ ML or T0: αp Type I error P T 0 <a or αp Type I errorp T 0 > a This implies that the uiformly most powerful test has bee foud. Hypothesis tests { H 0 : μ μ0 H : μ μ >μ 0 { H 0 : μ μ0 H : μ μ <μ 0 A uiformly most powerful test for H 0 : μ μ 0 is also uiformly most powerful for H 0 : μ μ 0. Coclusio: Well-kow theoretical results have bee applied to study the optimal form for the critical regio of differet pairs of hypothesis. Sice both the likelihood ratio ad the maximum likelihood estimator use the likelihood fuctio, the critical regio of the tests ca be expressed i terms of this estimator. My otes: [HT-p] Aalysis of Variace ANOVA Exercise ht-av The fog idex is used to measure the readig difficulty of a writte text: The higher the value of the idex, the more difficult the readig level. We wat to kow if the readig difficulty idex is differet for three magazies: Scietific America, Fortue, ad the New orker. Three idepedet radom samples of 6 advertisemets were take, ad the fog idices for the 8 advertisemets were measured, as recorded i the followig table SCIENTIFIC AMERICAN FORTUNE NEW ORKER 5.75.63 9.7.55.6 8.8.6 0.77 8.5 9.9 9.93 6.37 9.3 9.87 6.37 8.0 9. 5.66 Solved Exercises ad Problems of Statistical Iferece

Apply a aalysis of variace to test whether the average level of difficulty is the same i the three magazies. From Statistics for Busiess ad Ecoomics, Newbold, P., W.L. Carlso ad B.M. Thore, Pearso. Discussio: The aalysis of variace ca be applied whe populatios are ormally distributed ad their variaces are equal, that is, p N μ p, σp with σ p σ, p. These suppositios should be evaluated this will be doe at the ed of the exercise. If the equality of the meas is rejected, additioal aalyses would be ecessary to idetify which meas are differet this iformatio is ot provided by the aalysis of variace. O the other had, the calculatios ivolved i this aalysis are so tedious that almost everybody uses the computer. Fially, the uit of measuremet of the idex u is ukow for us. Statistic: There is oe factor idetifyig the populatio out of the three possible oes we do ot cosider other magazies, so a oe-factor fixed-effects aalysis will be applied. The statistic is MSG MSG T SA, FO, N with T 0 F P, P F 3, 8 3 F, 5 MSW MSW Some calculatios are ecessary to evaluate of the statistic T x SA, x FO, x N. First of all, we look at the three sample meas: 6 5.75 u 8.0 u SA j SA, j 0.97 u 6 6 6.63 u 9. u FO j FO, j 0.68 u 6 6 6 9.7 u 5.66 u N j N, j 7.35u 6 6 The magitude of the first ad the third seems quite differet, which suggests that the populatio meas may be differet. Nevertheless, we should ot trust ituitio. 5.75 u 5.66 u j j 9.67 u 8 8 P p SA SA N FO FO N SSG p p 6 0.97 u 9.67 u 6 0.68 u 9.67 u 6 7.35 u 9.67 u8.53 u MSG 8.53 u SSG.6 u P 3 P p 6 6 6 SSW p j p, j p j SA, j SA j FO, j FO j N, j N 5.75 u 0.97 u 8.0 u 0.96 u.63 u 0.68 u 9. u 0.68 u 9.7 u 7.35 u 5.66 u 7.35 u 5. u 5. u MSW SSW 3.8 u P 8 3 ad, fially, T 0 x SA, x FO, x N MSG.6 u 6.97 MSW 3.8u Hypotheses ad form of the critical regio: H 0 : μ μ μ P ad H : a, b / μ a μ b 3 Solved Exercises ad Problems of Statistical Iferece

For this statistic, By applyig the defiitio of α: α P Type I error P Reject H 0 H 0 true P T Rc H 0 P T 0 >a ar α6.359 Rc {T 0 SA, FO, N > 6.359 } > qf0.99,, 5 [] 6.358873 Decisio: Fially, it is ecessary to check if this regio suggested by H0 is compatible with the value that the data provide for the statistic. If they are ot compatible because the value seems extreme whe the hypotheses is true, we will trust the data ad reject the hypothesis H0. Sice T 0 x SA, x FO, x N 6.97 > 6.359 T 0 x Rc H0 is rejected. The secod methodology is based o the calculatio of the p-value: pv P SA, FO, N more rejectig tha x SA, x FO, x N H 0 true P T 0 SA, FO, N >T 0 x SA, x FO, x N P T 0 >6.970.007 pv 0.0073< 0.0α H0 is rejected. > -pf6.97,, 5 [] 0.007356 Coclusio: As suggested by the sample meas, the populatio meas of the three magazies are ot equal with a cofidece of 0.99, measured i a 0-to- scale. Pairwise comparisos could be applied to idetify the differeces. Code to apply the aalysis semimaually We have ot doe the calculatios by had but usig the programmig laguage R. The code is: # To eter the three samples SA c5.75,.55,.6, 9.9, 9.3, 8.0 FO c.63,.6, 0.77, 9.93, 9.87, 9. N c9.7, 8.8, 8.5, 6.37, 6.37, 5.66 # To joi the samples i a uique vector Data csa, FO, N # To calculate the sample mea of the three groups ad the total sample mea measa ; meafo ; mean ; meadata # To calculate the measures ad the statistic for large datasets, the previous meas should have bee saved SSG 6*meaSA - meadata^ 6*meaFO - meadata^ 6*meaN - meadata^ MSG SSG/3- SSW sumsa - measa^ sumfo - meafo^ sumn - mean^ MSW SSW/8-3 T0 MSG/MSW # To fid the quatile 'a' that determies the critical regio a qf0.99,, 5 # To calculate the p-value pvalue - pft0,, 5 I the cosole, write the ame of a quatity to prit its value. Code to apply the aalysis with R Statistical software programs have may built-i fuctios to apply the most basic methods. Now we use R to obtai the aalysis of variace table. As regards the sytaxis, it is based o the liear regressio framework, Solved Exercises ad Problems of Statistical Iferece

p, j μ p ϵ p, j, where this liear depedece of o the factor effect μp is deoted by Data ~ Group see the call to the fuctio aov below. ## After ruig the first block of lies of the previous code: # To create a vector with the membership labels Group factorcrep"sa",legthsa, rep"fo",legthfo, rep"n",legthn # To apply a oe-factor aalyis of variace objectav aovdata ~ Group # To prit the table with the results summaryobjectav The ANOVA table is Df Group Residuals 5 --Sigif. codes: Sum Sq 8.53 5. 0 *** Mea Sq.6 3.8 0.00 ** F value 6.97 0.0 * Pr>F 0.0073 ** 0.05. 0. Compare these quatities with those obtaied i the previous calculatios. A equivalet way of applyig the aalysis of variace with R cosists i substitutig the lies # To apply a oe-factor aalyis of variace objectav aovdata ~ Group # To prit the table with the results summaryobjectav by the lies # To fit a liear regressio model Model lmdata ~ Group # To apply ad prit the aalysis of variace aovamodel Code to check the assumptios By usig a computer it is also easy to evaluate the fulfillmet of the assumptios. # To eter the three samples SA c5.75,.55,.6, 9.9, 9.3, 8.0 FO c.63,.6, 0.77, 9.93, 9.87, 9. N c9.7, 8.8, 8.5, 6.37, 6.37, 5.66 # To joi the samples i a uique vector Data csa, FO, N # To create a vector with the membership labels Group factorcrep"sa",legthsa, rep"fo",legthfo, rep"n",legthn # To test the ormality of the sample SA by applyig two differet hypothesis tests shapiro.testsa ks.testsa, "porm", meameasa, sdsdsa # To test the ormality of the sample FO by applyig two differet hypothesis tests shapiro.testfo ks.testfo, "porm", meameafo, sdsdfo # To test the ormality of the sample N by applyig two differet hypothesis tests shapiro.testn ks.testn, "porm", meamean, sdsdn # To test the equality of the variaces bartlett.testdata ~ Group My otes: 5 Solved Exercises ad Problems of Statistical Iferece

[HT] Noparametric Remark ht: Noparametric methods ivolve questios ot based o parameters, ad therefore it is ot usually ecessary to evaluate some kids of suppositio that were preset i the parametric hypothesis tests. Exercise ht-p Occupatioal Hazards. The followig table is based o data from the U.S. Departmet of Labor, Bureau of Labor Statistics. Taxi Guards Drivers Police Cashiers Homicide 8 07 70 59 Cause of death other tha homicide 9 9 9 90 A Use the data i the table, comig from a simple radom sample, to test the claim that occupatio is idepedet of whether the cause of death was homicide. Use a sigificace α 0.05 ad apply a oparametric chi-square test. B Does ay particular occupatio appear to be most proe to homicides? If so, which oe? Based o a exercise of Essetials of Statistics, Mario F. Triola, Pearso LINGUISTIC NOTE From: Logma Dictioary of Commo Errors. Turto, N.D., ad J.B.Heato. Logma. job. our job is what you do to ear your livig: 'ou'll ever get a job if you do't have ay qualificatios.' 'She'd like to chage her job but ca't fid aythig better.' our job is also the particular type of work that you do: 'Joh's ew job souds really iterestig.' 'I kow she works for the BBC but I'm ot sure what job she does.' A job may be full-time or part-time NOT half-time or half-day: 'All she could get was a part-time job at a petrol statio.' do for a livig. Whe you wat to kow about the type of work that someoe does, the usual questios are What do you do? What does she do for a livig? etc 'What does your father do?' - 'He's a police ispector.' occupatio. Occupatio ad job have similar meaigs. However, occupatio is far less commo tha job ad is used maily i formal ad official styles: 'Please give brief details of your employmet history ad preset occupatio.' 'People i maual occupatios seem to suffer less from stress.' post/positio. The particular job that you have i a compay or orgaizatio is your post or positio: 'She's bee appoited to the post of deputy pricipal.' 'He's applied for the positio of sales maager.' Post ad positio are used maily i formal styles ad ofter refer to jobs which have a lot of resposability. career. our career is your workig life, or the series of jobs that you have durig your workig life: 'The scadal brought his career i politics to a sudde ed.' 'Later o i his career, he became first secretary at the British Embassy i Washigto.' our career is also the particular kid of work for which you are traied ad that you ited to do for a log time: 'I wated to fid out more about careers i publishig.' trade. A trade is a type of work i which you do or make thigs with your hads: 'Most of the me had worked i skilled trades such as carpetry or pritig.' 'My gradfather was a bricklayer by trade.' professio. A professio is a type of work such as medicie, teachig, or law which requires a high level of traiig or educatio: 'Util recetly, medicie has bee a male-domiated professio.' 'She etered the teachig professio i 987.' LINGUISTIC NOTE From: The Careful Writer: A Moder Guide to Eglish Usage. Berstei, T.M. Atheeum occupatios. The words people use affectioately, humorously, or disparagigly to describe their ow occupatios are their ow affair. They may say, I'm i show busiess or, more likely, show biz, or I'm i the advertisig racket, or I'm i the oil game, or I'm i the garmet lie. But outsiders should use more cautio, more discretio, ad more precisio. For istace, it is improper to write, Mr. Daaher has bee i the law busiess i Washigto. Law is a professio. Similarly, to say someoe is i the teachig game would udoubtedly give offese to teachers. Uless there is some special reaso to be slagy or colloquial, the advisable thig to do is to accord every occupatio the digity it deserves. 6 Solved Exercises ad Problems of Statistical Iferece

Discussio: I this exercise, it is clear from the statemet that we eed to test the idepedece of two variables. A particular sample x,...,x90 were grouped ad we are give the absolute frequecies i the empirical table. By lookig at the table, the cashier occupatio appears to be most proe to homicides. Statistic: Sice we have to apply a test of idepedece, from a table of statistics e.g. i [T] we select N lk e^lk d T 0 l k χ L K e^lk L K for L ad K classes, respectively. Hypotheses: The ull hypothesis supposes that the two variables are idepedet, H 0 :, idepedet H :, depedet ad or, probabilistically, ad H 0 : f x, y f x f y H : f x, y f x f y This implies that the probability at ay cell is the product of the margial probabilities of its file ad colum. Note that two uderlyig probability distributios are supposed for ad, although we do ot care about them, ad we will directly estimate the probabilities from the empirical table. As by substitutig i the expressio of the statistic, 38 7 7 0 8 90 90 T 0 x 65.5 38 7 7 0 90 90 This value, calculated uder H0 ad usig the data, is ecessary both to determie the critical regio ad to calculate the p-value. O the other had, for ay chi-square tests T0 is a oegative measure of the dissimilarity betwee the two tables; therefore, a value close to zero meas that the two tables are similar, while the critical regio is always of the form: Decisio: There are L ad K classes, respectively, so d T 0 χl K χ χ3 For the first methodology, to calculate a the defiitio of type I error is applied with α 0.05: αp Type I error P Reject H 0 H 0 true PT Rc H 0 P T 0 >a 7 Solved Exercises ad Problems of Statistical Iferece

ar α7.8 Rc {T 0 >7.8 } The decisio is: T 0 x 65.5 Rc H0 is rejected. If we apply the methodology based o the p-value, pv P more rejectig tha x H 0 truep T 0 > T 0 x P T 0 >65.5 3.88578 0 pv <0.05α H0 is rejected. -pchisq65.5, 3 [] 3.88578e- Istead of usig the computer, we ca cosider the last value i our table to bid the p-value statisticias wat to discover its value, while we wat oly to check whether or ot it is smaller tha α: pv P T 0 >65.5 < PT 0 >.30.0 pv <0.0<0.05α H0 is rejected Coclusio: The hypothesis that the two variables are idepedet is rejected. This meas that there seems to be a correlatio betwee occupatio ad cause of death. Remember: statistical results deped o: the assumptios, the methods, the certaity ad the data. My otes: Exercise ht-p World War II Bomb Hits i Lodo. To carry out a aalysis, South Lodo was divided ito 576 areas. For the variable N umber of bombs i the k-th area ay, a simple radom sample x,...,x576 was gathered ad grouped i the followig table: EMPIRICAL Number of Bombs 0 3 5 or more Number of Regios 9 93 35 7 576 Data take from: A applicatio of the Poisso distributio. Clarke, R. D. Joural of the Istitute of Actuaries [JIA] 96 7: 8 http://www.actuaries.org.uk/research-ad-resources/documets/applicatio-poisso-distributio By applyig the chi-square goodess-of-fit methodology, Test at 95% cofidece whether N ca be supposed to follow a Poisso distributio. Test at 95% cofidece whether N ca be supposed to follow a Poisso distributio with λ 0.8. Discussio: We must apply the chi-square methodology to study if the data statistically fit the models specified. I the secod sectio, a value for the parameter is give. For this probability model, We have to calculate or estimate the probabilities i order to obtai the expected absolute frequecies. Fially, by usig the statistic T we will compare the two tables ad make a decisio. Statistic: Sice we have to apply a goodess-of-fit test, from a table of statistics e.g. i [T] we select 8 Solved Exercises ad Problems of Statistical Iferece

N k e^k d T 0 k χ K s e^k K where K is the umber of classes ad s is the umber of parameters of F0 that we eed to estimate so as to use this distributio for obtaiig the class probabilities or approximatios of them. Fit to the Poisso family Hypotheses: For this oparametric goodess-of-fit test, the hypotheses are H 0 : N F 0 Pois λ ad H : N F Pois λ It ca be thought that both hypotheses are composite. To fill i the expected table uder H0, the formula e^ k ^p k will be applied. To estimate ^p k the supposed distributio uder H0 must be used. Ad to use the distributio, a estimator λ^ of the parameter is ecessary. Oce we have this estimator, the probabilities are calculated by usig software, tables, or the mass fuctio plus the plug-i priciple: f x ; λ. O the other had, to estimate λ we take ito accout that for this distributio the expectatio ad also the variace is equal to the parameter. Sice the sample mea estimates the expectatio, i this case it ca be used to estimate λ too. If we had ot remembered this property, we would have applied the method of the momets or the maximum likelihood method to obtai this estimator. The, 576 ^ μ ^ x λ x 576 j j Sice our data are grouped, we ca imagie that look at the table: 9 data are 0's, are 's, 93 are 's, 35 are 3's, 7 are 's, ad, fially, is ukow but equal or higher tha 5, so we ca cosider 5 or eve 6. 9 0 93 35 37 5 0.93 λ 576 By usig the plug-i priciple ad the calculator we obtai p 0P λ 0 f λ 0 0.930 0.93 e 0.395 0! p P λ f λ 0.93 0.93 e 0.367! p P λ f λ 0.93 0.93 e 0.7! p 3P λ 3 f λ 3 0.933 0.93 e 0.059 3! p P λ f λ 0.93 0.93 e 0.03! p 5> P λ 0.39 0.00.0070 Poisso λ 0.93 Values 0 Probabilities 0.395 0.367 3 0.7 0.059 0.03 5 or more 0.0070 Now, we fill i the expected table by usig the formula e^ k ^p k. EPECTED UNDER H0 Number of Bombs Number of Regios 0 7.6.35 3 5 or more 98.8 30.7 7.08.55 576 We have really doe the calculatios with the programmig laguage R. By usig a calculator, some quatities may be slightly differet due to techicals effects umber of decimal digits, accuracy, etc. 9 Solved Exercises ad Problems of Statistical Iferece

> dpoisc0,,,3,, 0.93 [] 0.395537 0.3669395 0.70675 0.0589367 0.09778 > - sumdpoisc0,,,3,, 0.93 [] 0.0069535 > 576*dpoisc0,,,3,, 0.93 [] 7.6937.3553 98.79857 30.66756 7.0835 > 576* - sumdpoisc0,,,3,, 0.93 [].55398 To guaratee the quality of the chi-square methodology, the expected absolute frequecies are usually required to be larger tha four 5. For this reaso, we merge the last two classes i both the empirical ad the expected tables. EMPIRICAL Number of Bombs 0 3 or more Number of Regios 9 93 35 78 576 EPECTED UNDER H0 Number of Bombs Number of Regios 0 7.6.35 3 5 or more 98.8 30.7 7.08.558.63 576 We evaluate T0, which is ecessary to apply ay of the two methodologies. T 0 x 9 7.6 8 8.63.09 7.6 8.63 We have calculated the value of T0 with the computer too: > empirical c9,, 93, 35, 8 > expected 576*cdpoisc0,,,3, 0.93, -sumdpoisc0,,,3, 0.93 > sumempirical-expected^/expected [].0886 For this kid of test, the critical regio always has the followig form: Decisio: There are K 5 classes after mergig two of them ad s estimatio, so d T 0 χ K s χ5 χ 3 If we apply the methodology based o the critical regio, the ecessary quatile a is calculated from the defiitio of the type I error, with the give α 0.05: αp Type I error P Reject H 0 H 0 true P T 0 Rc P T 0 >a ar α7.8 Rc {T 0 >7.8 } > qchisq-0.05, 3 [] 7.878 The, the decisio is: T 0 x.09 < 7.8 T 0 x Rc H0 is ot rejected. If we apply the alterative methodology based o the p-value, pv P more rejectig tha x H 0 true P T 0 >T 0 x P T 0 >.090.80 pv 0.80> 0.05α H0 is ot rejected. 30 > - pchisq.09, 3 [] 0.796657 Solved Exercises ad Problems of Statistical Iferece

Fit to a member of the Poisso family Hypotheses: For this oparametric goodess-of-fit test, the hypotheses are H 0 : N F 0 Pois 0.8 ad H : N F Pois 0.8 It ca be thought that the ull hypothesis is simple while the alterative hypothesis is composite. To fill i the expected table uder H0, the formula e k p k will be applied, where the probabilities ca be take from a table or ca be calculated by substitutig i the mass fuctio f x ; λ, 0 p 0P λ 0 f λ 0 0.8 0.8 e 0.9 0! p P λ f λ 0.8 0.8 e 0.359! p P λ f λ 0.8 0.8 e 0.! p 3P λ 3 f λ 3 0.83 0.8 e 0.0383 3! 0.8 0.8 p P λ f λ e 0.00767! p 5> P λ 0.00 Poisso λ 0.8 Values 0 Probabilities 3 5 or more 0.9 0.359 0. 0.0383 0.00767 0.00 Now, we fill i the expected table by usig the formula e k p k. EPECTED UNDER H0 Number of Bombs Number of Regios 0 58.8 07.05 3 5 or more 8.8.09. 0.83 576 Agai, we have doe these calculatios with the programmig laguage R. > dpoisc0,,,3,, 0.8 [] 0.93896 0.359637 0.378569 0.0383738 0.00766858 > -sumdpoisc0,,,3,, 0.8 [] 0.003 > 576*dpoisc0,,,3,, 0.8 [] 58.8383 07.050787 8.8035.0857.7083 > 576*-sumdpoisc0,,,3,, 0.8 [] 0.896 As i the previous case, we merge the last two classes for all the expected absolute frequecies to be larger tha four EMPIRICAL Number of Bombs 0 3 or more Number of Regios 9 93 35 78 576 EPECTED UNDER H0 Number of Bombs Number of Regios 0 58.8 07.05 3 5 or more 8.8.09.0.835.33 We calculate the value of T0 with the computer as well: > empirical c9,, 93, 35, 8 > expected 576*cdpoisc0,,,3, 0.8, -sumdpoisc0,,,3, 0.8 > sumempirical-expected^/expected [] 3.7798 so 3 Solved Exercises ad Problems of Statistical Iferece 576

T 0 x 9 58.8 8 5.33 3.78 58.8 5.33 O the other had, for this kid of test the critical regio always has the followig form: Decisio: Now K 5 ad s 0, sice o estimatio has bee eeded, so d T 0 χ K s χ5 0 χ Now T0 follows the χ distributio with degrees of freedom it was 3 i the previous sectio. αp Type I errorp Reject H 0 H 0 truep T 0 R c P T 0 > a ar α9.9 Rc {T 0 >9.9 } > qchisq-0.05, [] 9.8779 The, the decisio is: T 0 x3.78 > 9.9 T 0 x Rc H0 is rejected. If we apply the methodology based o the p-value, pv P more rejectig tha x H 0 true P T 0 >T 0 x P T 0 > 3.780.0080 pv 0.0080< 0.05α H0 is rejected. > -pchisq3.78, [] 0.008033 Coclusio: The hypothesis that bomb hits ca reasoably be modeled by usig the Poisso family has ot ^ bee rejected. I this case, data provided a estimate λ0.93. Nevertheless, whe the value λ0.8 is imposed, the hypotheses that bomb hits ca be modeled by usig a Poisλ0.8 model is rejected. This proves that: i. Eve quite reasoable a model may ot fit the data if iappropriate parameter values are cosidered. This emphasizes the importace of usig good parameter estimatio methods. ii. Estimatig the parameter value was better tha fixig a value close to the estimate. As statisticias say: let the data talk. This hightlights the ecessity of testig all suppositios, which implies that oparametric procedures should sometimes be applied before the parametric oes: i this case, before supposig that the Poisso family is proper ad imposig a value for the parameter, the whole Poisso family must be cosidered. Remember: statistical results deped o: the assumptios, the methods, the certaity ad the data. Advaced theory: Medelhall, W., D.D. Wackerly ad R.L. Scheaffer say Mathematical Statistics with Applicatios, Duxbury Press that the expected absolute frequecies ca be as low as for some situatios, accordig to Cochra, W.G., The χ Test of Goodess of Fit, Aals of Mathematical Statistics, 3 95 pp. 35-35. To take the most advatage of this exercise, we repeat the previous calculatios without mergig the last two classes. Fit to the Poisso family We evaluate T0, which is ecessary to apply ay of the two methodologies. 9 7.6.55 T 0 x.67 7.6.55 d Now there are K 6 classes ad s estimatio, so T 0 χ K s χ6 χ. If we apply the 3 Solved Exercises ad Problems of Statistical Iferece

methodology based o the critical regio, the ecessary quatile a is calculated from the defiitio of the type I error, with the give α 0.05: αp Type I error P Reject H 0 H 0 true P T 0 Rc P T 0 >a > qchisq-0.05, [] 9.8779 ar α9.9 Rc {T 0 >9.9 } The, the decisio is: T 0 x.67 < 9.9 T 0 x Rc H0 is ot rejected. If we apply the alterative methodology based o the p-value, pv P more rejectig tha x H 0 truep T 0 >T 0 xp T 0 >.670.88 > -pchisq.67, [] 0.88350 pv 0.88> 0.05α H0 is ot rejected. Fit to a member of the Poisso family We calculate the value of T0 T 0 x 9 58.8 0.83 3.87 58.8 0.83 d Sice K 6 ad s 0, ow T 0 χ K s χ6 0 χ5. The, αp Type I errorp Reject H 0 H 0 truep T 0 R c P T 0 > a ar α.07 Rc {T 0 >.07} > qchisq-0.05, 5 [].0705 The, the decisio is: T 0 x3.87 >.07 T 0 x Rc H0 is rejected. If we apply the methodology based o the p-value, pv P more rejectig tha x H 0 true PT 0 >T 0 x P T 0 > 3.870.065 pv 0.065< 0.05α H0 is rejected. > -pchisq3.87, 5 [] 0.065663 I both sectios the same decisios have bee made, which implies that this is oe of those situatios where mergig the last two classes does ot seem essetial. My otes: Exercise 3ht-p Three fiatial products have bee commercialized ad the presece of iterest i them has bee registered for some idividuals. It is possible to imagie differet situatios where the followig data could have bee obtaied. Product Product Product 3 Group 0 8 9 37 Group 0 3 5 8 30 3 85 a A simple radom sample of 8 people of the secod group were allocated after cosiderig the variable product, test at α 0.0 whether this variable follows the distributio determied by the sample of the first group. 33 Solved Exercises ad Problems of Statistical Iferece

b A simple radom sample of 85 people with iterest i ay of the products were allocated after cosiderig the two variables group ad product. Test at α 0.0 the idepedece of the two variables. c From two idepedet groups, simple radom samples of 37 people ad 8 people are surveyed, respectively. Test at α 0.0 the homogeeity of the distributio of the variable product i the groups. Discussio: I this exercise, the same table is looked at as cotaiig data obtaied from three differet schemes. The chi-square methodology will be applied i all sectios through three kids of test: goodess-of-fit, idepedece ad homogeeity. I the first case, a probability distributio F0 is specified, while i the last two cases the uderlyig distributios have o iterest by themselves. a Goodess-of-fit test Statistic: To apply a goodess-of-fit test, from a table of statistics e.g. i [T] we select N k e^k d T 0 k χ K s e^k K where there are K classes ad s parameters must be estimated to determie the probabilities. Hypotheses: For a oparametric goodess-of-fit test, the ull hypothesis assumes that the theoretical probabilities of the secod group follow the probabilities determied by the sample of the first group. If Fk represets the distributio of the variable product i the k-th populatio, H 0: F F ad H : F F F The variable of the first group determies the followig distributio F: Value 3 Probability 0 37 8 37 9 37 Now, uder H0 the formula e k pk allows us to fill i the expected table: The, we eed the evaluatio T 0 x 0 8 9 3 8 5 8 37 37 37 9.3 0 8 9 8 8 8 37 37 37 0 8 O the other had, for this kid of test the critical regio always has the form 3 Solved Exercises ad Problems of Statistical Iferece

Decisio: Sice there are K 3 classes ad s 0 o parameter has to be estimated to determie the probabilities, d T 0 χ K s χ3 0 χ If we apply the methodology based o the critical regio, the ecessary quatile a is calculated from the defiitio of the type I error, with the give α 0.0: αp Type I error P Reject H 0 H 0 true P T 0 Rc P T 0 >a ar α9. Rc {T 0 >9. } The, the decisio is: T 0 x9.3 > 9. T 0 x Rc H0 is rejected. If we apply the methodology based o the p-value, pv P more rejectig tha x H 0 truep T 0 > T 0 x P T 0 >9.3 < P T 0 > 9. 0.990.0 9.3 is ot i our table while 9. is pv < 0.0α H0 is rejected. b Idepedece test Statistic: To apply a test of idepedece, from a table of statistics e.g. i [T] we select N lk e^lk d T 0 l k χ L K e^lk L K for L ad K classes, respectively. Uderlyig distributios are supposed but ot specified for the variables ad, ad the probabilities are directly estimated from the sample iformatio. Hypotheses: For a oparametric idepedece test, the ull hypothesis assumes that the probabilities at ay cell is the product of the margial probabilities of its file ad colum, H 0 :, idepedet H :, depedet ad or, probabilistically, ad H 0 : f x, y f x f y Uder H0, the formula e^lk p^lk p^ l p^ k H : f x, y f x f y N l N k allows us to fill i the expected table: The, T 0 x 37 30 8 5 85 85.9 37 30 8 85 85 0 For this kid of test, the critical regio always has the followig form: 35 Solved Exercises ad Problems of Statistical Iferece

Decisio: There are L ad K 3 classes, respectively, so d T 0 χ L K χ 3 χ For the first methodology, to calculate a the defiitio of type I error is applied with α 0.0: αp Type I error P Reject H 0 H 0 true PT 0 Rc P T 0 >a ar α9. Rc {T 0 >9. } The decisio is: T 0 x.9 T 0 x Rc H0 is ot rejected. If we apply the methodology based o the p-value, pv P more rejectig tha x H 0 truep T 0 >T 0 x P T 0.9 > P T 0 >.6 0.90. pv > 0.> 0.0α H0 is ot rejected. c Homogeeity test Statistic: To apply a test of homogeeity, from a table of statistics e.g. i [T] we select N lk e^lk d T 0 l k χ L K e^lk L K for L groups ad K classes. A uderlyig distributio is supposed but ot specified for the variable, ad the probabilities are directly estimated from the sample iformatio. Note that the membership of a group ca be see as the value of a factor. Hypotheses: For a oparametric homogeeity test, the ull hypothesis assumes that the margial probabilities i ay colum are the same for the two groups, that is, are idepedet of the group or stratum. This meas that the variable of iterest follows the same probability distributio i each subgroup or stratum. If G represets the variable group, mathematically, H 0 : F x G F x Uder H0, the formula e^lk l p^lk l p^ k l 36 ad H : F x G F x N k allows us to fill i the expected table: Solved Exercises ad Problems of Statistical Iferece

The 30 0 37 5 8 85 85 T 0 x.9 30 37 8 85 85 For this kid of test, the critical regio always has the followig form: Decisio: For L groups ad K 3 classes, d T 0 χ L K χ 3 χ If we apply the methodology based o the critical regio, to calculate the quatile a the defiitio of type I error is applied with α 0.0: αp Type I error P Reject H 0 H 0 true P T 0 i Rc P T 0 i > a ar α9. Rc {T 0 >9. } To make the decisio: T 0 x.9 T 0 x Rc H0 is ot rejected If we apply the methodology based o the p-value, pv P more rejectig tha x H 0 truep T 0 i >T 0 x i PT 0 i >.9 > P T 0 i >.6 0.90..9 is ot i our table while.6 is pv > 0.> 0.0α H0 is ot rejected Coclusio advaced: Neither the idepedece or the homogeeity has bee rejected, while the hypothesis supposig that the variable product follows i populatio the distributio determied by the sample of the group has bee rejected. O the oe had, the distributio determied by oe sample, ivolved i sectio a, is i geeral differet to the commo supposed uderlyig distributio ivolved i sectio b, which is estimated by usig the samples of both groups. Thus, it ca be thought that this uderlyig distributio is betwee the two samples, by which we ca justify the decisios made i a, b ad c. Group has more weight i determiig that distributio, sice it has more elemets. It is worth oticig the similarity betwee the idepedece ad the homogeeity tests: same distributio ad evaluatio for the statistic, same critical regio, et cetera. As regards the applicatio of the methodologies, bidig the p-value is sometimes eough to discover whether it is smaller tha α or ot, but i geeral statisticias wat to fid its value. My otes: 37 Solved Exercises ad Problems of Statistical Iferece

[HT] Parametric ad Noparametric Exercise ht To test if a coi is fair or ot, it has idepedetly bee tossed 00,000 times the outputs are a simple radom sample, ad 50,37 of them were heads. Should the fairess of the coi, as ull hypothesis, be rejected whe α 0.? a Apply a parametric test. By usig a computer, plot the power fuctio. b Apply the oparametric chi-square goodess-of-fit test. c Apply the oparametric positio sigs test. Discussio: I this exercise, o suppositio should be evaluated: i a because the Beroulli model is the oly proper oe to model a coi, ad i b ad c because they ivolve oparametric tests. The sectios of this exercise eed the same calculatios as i previous exercises. a Parametric test Statistic: From a table of statistics e.g. i [T], sice the populatio variable is Beroulli ad the asymptotic framework ca be cosidered sice is big, the statistic T ; η d η η N 0,?? is selected, where the symbol? is substituted by the best iformatio available. I testig hypotheses, it will be used i two forms: T 0 η η 0 d η0 η0 N 0, ad T η η η η d N 0, where the supposed kowledge about the value of η is used i the deomiators to estimate the variace we do ot have or suppose this iformatio whe T is used to build a cofidece iterval, or for tests with two populatios. Regardless of the methodology to be applied, the followig value will be ecessary: T 0 x 50,37 00,000.9 00,000 where η0 / whe the coi is supposed to be fair. Hypotheses: Sice a parametric test must be applied, the coi dichotomic situatio is modeled by a Beroulli radom variable, ad the hypotheses are H 0 : η η0 ad H : η η Note that the questio is about the value of the parameter η while the Beroulli distributio is supposed uder both hypotheses; i some oparametric tests, this distributio is ot eve supposed i geeral although the 38 Solved Exercises ad Problems of Statistical Iferece

oly reasoable distributio to model a coi is the Beroulli. For this kid of alterative hypothesis, the critical regio takes the form Decisio: To determie Rc, the quatiles are calculated from the type I error with α 0. at η0 /: α /P Type I error P Reject H 0 H 0 truep T ; θ Rc H 0 P T 0 > a ar α/.65 Rc { T 0 >.65 } Thus, the decisio is: T 0 x.9 >.65 T 0 x Rc H0 is rejected. If we apply the methodology based o the p-value, pv P more rejectig tha x H 0 truep T 0 > T 0 x P T 0 <.9 0.030.08 pv 0.08 < 0.α H0 is rejected. Power fuctio: To calculate β, we have to work uder H. Sice i this case the critical regio is already expressed i terms of T0 ad we must use T, we apply the mathematical tricks of multiplyig ad dividig by the same quatity ad of addig ad subtractig the quatity: βη P Type II error P Accept H 0 H true P T 0 R c H P T 0.65 H P.65 η0 η η0 η0.65 H P.65 η η.65 η η η0 η0 η0 η η0 η0 η η P.65 η0 η0 η η η η0 η η0.65 0 η η η η η η P.65 P η η η0 η0 η η η η0.65 η η η η η η H.65 H η0 η0 η η η η0.65 η0 η0 η η0.65 η0 η0 η η0 T η η η η P T η η.65 η0 η0 η η0.65 η0 η0 η η0 P T < η η η η 39 P.65 η η 0 η0 η0 η η H Solved Exercises ad Problems of Statistical Iferece H

By usig a computer, may more values η 0.5 ca be cosidered to plot the power fuctio ϕη P Reject H 0 { αη if p Θ0 βη if p Θ # Sample ad iferece 00000 alpha 0. theta0 0.5 # Value uder the ull hypothesis H0 q qormcalpha/, -alpha/,0, theta seqfrom0,to,0.0 paramspace sortuiquectheta,theta0 PowerFuctio -pormq[]*sqrttheta0*-theta0-sqrt*paramspace-theta0/sqrtparamspace*-paramspace,0, pormq[]*sqrttheta0*-theta0-sqrt*paramspace-theta0/sqrtparamspace*-paramspace,0, plotparamspace, PowerFuctio, xlab'theta', ylab'probability of rejectig theta0', mai'power Fuctio', type'l' With this code the power fuctio is plotted: b Noparametric chi-square goodess-of-fit test Statistic: To apply a goodess-of-fit test, from a table of statistics e.g. i [T] we select K T 0 k N k e^k d χ K s e^k where there are K classes, ad s parameters have to be estimated to determie F0 ad hece the probabilities. Hypotheses: For a oparametric goodess-of-fit test, the ull hypothesis supposes that the sample was geerated by a Beroulli distributio with η0 /, while the alterative hypothesis supposes that it was geerated by a differet distributio Beroulli or ot, although this distributio is here the reasoable way of modelig a coi. ad H : F B H 0: F 0 B For the distributio F0, the table of probabilities is Value tail head Probability / / th ad, uder H0, the formula e k pk P θ k class00,000 50,000 allows us to fill i the expected table: 0 Solved Exercises ad Problems of Statistical Iferece

ad T 0 x k k e k 50,37 50,000 9,653 50,000.8 ek 50,000 50,000 O the other ad, for this kid of test, the critical regio always has the followig form: Decisio: There are K classes ad s 0 o parameter has bee estimated, so d T 0 χ K s χ 0 χ If we apply the methodology based o the critical regio, the defiitio of type I error, with α 0., is applied to calculate the quatile a: αp Type I error P Reject H 0 H 0 true P T 0 Rc P T 0 >a ar α.7 Rc {T 0 >.7} The, the decisio is: T 0 x.8 >.7 T 0 x Rc H0 is rejected. If we apply the methodology based o the p-value, pv P more rejectig tha x P T 0 >T 0 x P T 0 >.8 < P T 0 >3.80.05 pv < 0.05 < 0.α H0 is rejected. Note: Bidig the p-value is sometimes eough to make the decisio.8 is ot i our table while 3.8 is. c Noparametric positio sig test Statistic: To apply a positio sig test, from a table of statistics e.g. i [T] we select T 0 Number { j θ0 > 0 } Bi, P j >θ 0 Here θ00 ad Pj>0/, so MeT0ET0/. Hypotheses: For a oparametric positio test, if head ad tail are equivaletly traslated ito the umbers ad, respectively, the hypotheses are H 0 : Me θ 0 0 ad H : Me θ 0 For these hypotheses, Solved Exercises ad Problems of Statistical Iferece

We eed the evaluatio T 0 x 00,000/ 50,37 50,000 37 Decisio: I the first methodology, the quatile a is calculated by applyig the defiitio of the type I error with α 0.. O the oe had, we kow the distributio of T0, while, o the other had, Rc was easily writte i terms of T0 /, whose distributio is ivolved i a well-kow asymptotic result the Cetral Limit Theorem for the Bi,/. Moreover, the probabilities of the biomial distributio are ot tabulated for 00,000. The, αp Type I error P Reject H 0 H 0 true P T 0 Rc P T 0 / > a P r α /.65 a T 0 / a.65 a > P Z > a { 00,000 60.097 Rc T 0 > 60.097 } The fial decisio is: T 0 x 00,000/ 37 > 60.097a T 0 x Rc H0 is rejected. If we apply the methodology based o the p-value, pv P more rejectig tha x H 0 truep T 0 / > T 0 x / P T 0 / T 0 x / > P Z > 50,37 50,000 00,000 P Z >.9 P Z <.9 0.030.08 pv 0.08 < 0.α H0 is rejected. Coclusio: I this case the three differet tests agree to make the same decisio, but this may ot happe i other situatios. Whe it is possible to compare the power fuctios ad there exists a uiformly most powerful test, the decisio of the most powerful should be cosidered. I geeral, proper parametric tests are expected to have more power tha the oparametric oes i testig the same hypotheses. With two classes, the chi-square test does ot distiguish ay two distributios such that the two class probabilities are ½, ½, that is, i this case the test provides a decisio about the symmetry of the distributio chi-square tests work with class probabilities, ot with the distributios themselves. 3 I this exercise the parametric test ad the oparametric test of the sigs are essetially the same. Remember: statistical results deped o: the assumptios, the methods, the certaity ad the data. My otes: Solved Exercises ad Problems of Statistical Iferece

PE CI HT Exercise pe-ci-ht From a previous pilot study, cocerig the mothly amout of moey i $ that male ad female studets of a commuity sped o cell phoes, the followig hypotheses are reasoably supported: i. The variable amout of moey follows a ormal distributio i both populatios. ii. The populatio meas are μm $. ad μf $3.5, respectively. iii. The two populatios are idepedet. Two idepedet simple radom samples of sizes M 53 ad F 9 are cosidered, from which the followig statistics have bee calculated: S M $.99 ad S F $ 5.0 The, F.7. Repeat the calculatios with the suppositio that σm A Calculate the probability P M ad σf are equal. B Build a 95% cofidece iterval for the quotiet σm/σf. Discussio: The pilot statistical study metioed i the statemet should cover the evaluatio of all suppositios. The hypothesis that σm σf should be evaluated as well. The iterval will be built by applyig the method of the pivot. Idetificatio of the variable: M Amout of moey spet by a male oe M ~ NμM, σm F Amout of moey spet by a female oe F ~ NμF, σf Selectio of the statistics: We kow that: There are two idepedet ormal populatios Stadard deviatios σm ad σf are ukow, ad we compare them through σm/σf From a table of statistics e.g. i [T] we select T M, F ; μ M,μ F T M, F ; μ M,μ F F μ M μ F M SM SF M F F μ M μ F M S p S p M F 3 t κ with κ t M F SM S F M F SM SF M M F F M s M F s F M S M F S F with S M F M F p Solved Exercises ad Problems of Statistical Iferece

T M, F ; σ M,σ F SM σm S F SM σf S F σ M F M, F σf Because of the iformatio available, the first ad the secod statistics allow studyig M F the secod for the particular case where σm σf, while the third allows studyig σm/σf. A Calculatio of the probability: F.7P P M with T ~ tκ where κ F μ M μ F M M F S S M F S M S F M F S M M M.7 μ M μ F M F S S M F P T.7. 3.5 P T.9.99 5.0 53 9.99 5.0 53 9 99.33.99 5.0 S F 53 53 9 9 F F Should we roud this value dowward, κ 99, or upward, κ 00? We will use this exercise to show that For large values of κ ad κ, the t distributio provides close values For a large value of κ, the t distributio provides values close to those of the stadard ormal distributio the tκ distributio teds with κ to the stadard ormal distributio By usig the programmig laguage R: If we make it κ 99.33 to 99, the probability is If we make it κ 99.33 to 00, the probability is If we use the N0,, the probability is > pt.9, 99 [] 0.89997 > pt.9, 00 [] 0.899987 > porm.9 [] 0.9077 O the other had, whe the variaces are supposed to be equal they ca ad should be estimated joitly by usig the pooled sample variace. S M F S F 53 $.999 $ 5.0 S p M $ 5.00 $ 5 M F 53 9 The, F.7P P M with T t M F F μ M μ F M S p S p M F.7 μ M μ F S p S p M F P T.7. 3.5 P T.9 5 5 53 9 t 539 t 00. By usig the table of the t distributio, the probability is 0.9. By usig the laguage R, the probability is > pt.7-.3.5/sqrt5/535/9, 00 [] 0.899337 Solved Exercises ad Problems of Statistical Iferece

B Method of the pivotal quatity: αp l α/ T r α / P l α / SM σf σm S F SF r α/ P l α/ S M σf σ M r α/ SF S M P SM l α/ S F σm σ F SM r α / S F so cofidece itervals for σm/σf ad σm/σf are respectively give by I α [ S M r α / S F, S M l α/ S F ] ad the I α [ S M r α/ S F, ] S M l α/ S F I the calculatios, multiplyig by a quatity ad ivertig ca be applied i either order. Substitutio: We calculate the quatities i the formula, S M $.99 ad S F $ 5.0 95% α 0.95 α 0.05 α/ 0.05 { l α / 0.57 r α/.76 > qfc0.05, -0.05, 53-, 9- [] 0.57333.7583576 The [ I 0.95.99,.76 5.0 ].99 [0.75,.3] 0.57 5.0 Coclusio: First of all, i this case there is very little differece betwee the two ways of estimatig the variace. O the other had, as the variaces are related through a quotiet, the iterpretatio is ot direct: the dimesioless, multiplicative factor c i σm cσf is, with 95% cofidece, i the iterval obtaied. The iterval with dimesioless edpoits cotais the value, so it may happe that the variability of the amout of moey spet is the same for males ad females we caot reject this hypothesis ote that cofidece itervals ca be used to make decisios. Remember: statistical results deped o: the assumptios, the methods, the certaity ad the data. My otes: Exercise pe-ci-ht The electric light bulbs of maufacturer have a mea lifetime of 00 hours h, while those of maufacturer have a mea lifetime of 00h. Simple radom samples of 5 bulbs of each brad are tested. From these datasets the sample quasivariaces Sx 56h ad Sy 59h are computed. If maufacturers are supposed to be idepedet ad their lifetimes are supposed to be ormally distributed: a Build a 99% cofidece iterval for the quotiet of stadard deviatios σ/σ. Is the value σ/σ, that is, the case σσ, icluded i the iterval? k 0.. b By usig the proper statistic T, fid k such that P Hit: i Firstly, build a iterval for the quotiet σ/σ; secodly, apply the positive square root fuctio. ii If a radom variable ξ follows a F, the Pξ 0.68 0.005 ad Pξ.59 0.995. iii If ξ follows a t8, the Pξ 0.5 0. Based o a exercise of Statistics, Spiegel, M.R., ad L.J. Stephes, McGraw Hill. LINGUISTIC NOTE From: Logma Dictioary of Commo Errors. Turto, N.D., ad J.B.Heato. Logma. electric meas carryig, producig, produced by, powered by, or charged with electricity: 'a electric wire', 'a electric geerator', 'a electric shock', 'a electric curret', 'a electric light bulb', 'a electric toaster'. For machies ad devices that are powered by electricity but do ot have trasistors, microchips, valves, etc, use electric NOT electroic: 'a electric guitar', 'a electric trai set', 'a electric razor'. 5 Solved Exercises ad Problems of Statistical Iferece

electrical meas associated with electricity: 'electrical systems', 'a course i electrical egieerig', 'a electrical egieer'. To refer to the geeral class of thigs that are powered by electricity, use electrical NOT electric: 'electrical equipmet', 'We stock all the latest electrical kitche appliaces'. electroic is used to refer to equipmet which is desiged to work by meas of a electric curret passig through a large umber of trasistors, microchips, valves etc, ad compoets of this equipmet: 'a electroic calculator', 'tiy electroic compoets'. Compare: 'a electroic calculator' BUT 'a electric ove'. A electroic system is oe that uses equipmet of this type: 'electroic surveillace', 'email' electroic mail, a system for sedig messages very quickly by meas of computers. electroics WITH s refers to the brach of sciece ad techology cocered with the study, desig or use of electroic equipmet: 'a studets of electroics' used as a modifier aythig that is coected with this brach: 'the electroics idustry'. Discussio: There are two idepedet ormal populatios. All suppositios should be evaluated. Their meas are kow while their variaces are estimated from samples of size 5. A 99% cofidece iterval for σ/σ is required. The iterval will be built by applyig the method of the pivot. If the value σ/σ belogs to this iterval of cofidece 0.99, the probability of the secod sectio ca reasoably be calculated uder the suppositio σσσ this implies that the commo variace σ is joitly estimated by usig the pooled sample quasivariace Sp. O the other had, this exercise shows the atural order i which the statistical techiques must sometimes be applied i practice: the suppositio σσ is empirically supported by applyig a cofidece iterval or a hypothesis test before usig it i calculatig the probability. Sice the stadard deviatios have the same uits of measuremet as the data hours, their quotiet is dimesioless, ad so are the edpoits of the iterval. Idetificatio of the variables: Lifetime of a light bulb of maufacturer ~ Nμ00h, σ Lifetime of a light bulb of maufacturer ~ Nμ00h, σ a Cofidece iterval Selectio of the statistics: We kow that: There are two idepedet ormal populatios The stadard deviatios σ ad σ are ukow, ad we compare them through σ/σ From a table of statistics e.g. i [T] we select a dimesioless statistic. To compare the variaces of two idepedet ormal populatios, we have two cadidates: T, ; σ, σ V σ F V σ, ad T, ; σ, σ S σ F S σ, j μ ad S j, respectively similarly for populatio. j j We would use the first if we were give V ad V or we had eough iformatio to calculate them we kow the meas but ot the data themselves. I this exercise we ca use oly the secod statistic. where V Method of the pivot: αp l α/ T r α / P l α / S σ r α/ P l α/ σ S S S σ σ r α/ S S P S l α/ S so cofidece itervals for σ/σ ad σ/σ are respectively give by I α [ S r α/ S 6, S l α/ S ] ad I α [ S r α/ S, ] S l α/ S Solved Exercises ad Problems of Statistical Iferece σ σ S r α / S

I the previous calculatios, multiplyig by a quatity ad ivertig ca be applied either way. Substitutio: We calculate the quatities i the formula, S 56 h S 59 h ad 99% α 0.99 α 0.0 α/ 0.005 { P ξ l α / α/ 0.005 P ξ r α/ α /0.995 { l α/ 0.68 r α/.59 where the iformatio i ii of the hit has bee used. The [ I 0.95 56 h,.59 59 h ] 56 h [0.786,.5] 0.68 59 h The value σ/σ is i the iterval of cofidece 0.99 99%, so the suppositio σσ is strogly supported. b Probability To work with the differece of the meas of two idepedet ormal populatios whe σσ, we cosider: μ μ T, ; μ,μ t S p S p where S p x y S S 56 h 59 h 57.5 h is the pooled sample quasivariace. y 55 The quatile ca be foud after rewritig the evet as follows: 0. P k P μ μ S p S p k μ μ S p S p P T k 00 00 57.5 57.5 5 5 Now, by usig the iformatio i iii of the hit, l 0. 0.5 kh 00 h 00 h 57.5 h 57.5 h 5 5 57.5 h k 00 h 0.5 99.60 h 5 Coclusio: A cofidece iterval has bee obtaied for the quotiet of the stadard deviatios. The dimesioless value of θ σ/σ is betwee 0.786 ad.50 with cofidece 99%; alteratively, as the stadard deviatios are related through a quotiet, a equivalet iterpretatio is the followig: the dimesioless multiplicative factor θ i σθ σ is, with 99% cofidece, i the iterval obtaied. Sice the value θ is i this high-cofidece iterval, it may happe that the variability of the two lifetimes is the same we caot reject this hypothesis ote that cofidece itervals ca be used to make decisios; besides, it is reasoable to use the suppositio σσ i calculatig the probability of the secod sectio. If ay two simple radom samples of size 5 were cosidered, the differece of the sample meas will be smaller tha 99.60 with a probability of 0.. Oce two particular samples are substituted, radomess is ot ivolved ay more ad the iequality x y k 99.60 is true or false. The edpoits of the iterval have o dimesio, like the quotiet σ/σ or the multiplicative factor c. Remember: statistical results deped o: the assumptios, the methods, the certaity ad the data. My otes: 7 Solved Exercises ad Problems of Statistical Iferece

Exercise 3pe-ci-ht I 990, 5% of births were by mothers of more tha 30 years of age. This year a simple radom sample of size 0 births has bee take, yieldig the result that 3 of them were by mothers of over 30 years of age. a With a sigificace of 0%, ca it be accepted that the proportio of births by mothers of over 30 years of age is still 5%, agaist that it has icreased? Select the statistic, write the critical regio ad make a decisio. Calculate the p-value ad make a decisio. If the critical regio is Rc { η > 0.30 }, calculate β probability of the type II error for η 0.35. Plot the power fuctio with the help of a computer. b Obtai a 90% cofidece iterval for the proportio. Use it to make a decisio about the value of η, which is equivalet to havig applied a two-sided odirectioal hypothesis test i the first sectio. First half of sectio a ad first calculatio i b, from 007's exams for accessig the Spaish uiversity; I have added the other parts. Discussio: I this exercise, o suppositio should be evaluated. The umber 30 plays a role oly i defiig the populatio uder study. The Beroulli model is the oly proper oe to register the presece-absece of a coditio. Percets must be rewritte i a 0-to- scale. Sice the default optio is that the proportio has ot chaged, the equality is allocated i the ull hypothesis. O the other had, proportios are dimesioless by defiitio. a Hypothesis test Statistic: From a table of statistics e.g. i [T], sice the populatio variable follows the Beroulli distributio ad the asymptotic framework ca be cosidered large, the statistic T ; p d η η N 0,?? ^ I testig is selected, where the symbol? is substituted by the best iformatio available: η or η. hypotheses, it will be used i two forms: T 0 η η 0 d η0 η0 N 0, ad T η η η η d N 0, where the supposed kowledge about the value of η is used i the deomiators to estimate the variace we do ot have this iformatio whe T is used to build a cofidece iterval, like i the ext sectio. Regardless of the testig methodology to be applied, the evaluatio of the statistic is ecessary to make the decisio. Sice η00.5 3 0.5 0 T 0 x 0.83 0.5 0.5 0 Hypotheses: H 0 : η η0 0.5 8 ad H : η η > 0.5 Solved Exercises ad Problems of Statistical Iferece

For this alterative hypothesis, the critical regio takes the form c } Rc { η> { η η 0 η0 η0 > c η0 η0 η0 } {T 0 >a } Decisio: To determie Rc, the quatile is calculated from the type I error with α 0. at η0 0.5: α 0.5P Type I error P Reject H 0 H 0 true PT 0 >a ar 0.l 0.9.8 Rc {T 0 >.8 }. Now, the decisio is: T 0 x0.83 <.8 T 0 x Rc H0 is ot rejected. p-value pv P more rejectig tha x H 0 truep T 0 >T 0 x P T 0 > 0.830.00 pv 0.00 > 0.α H0 is ot rejected. Type II error: To calculate β, we have to work uder H. Sice the critical regio has bee expressed i terms of T0, ad we must use T, we could apply the mathematical trick of addig ad subtractig the same quatity. c } has ot bee calculated yet; ow, sice we Nevertheless, this way is useful whe the value c i Rc { η> 0.3} it is easier to directly stadardize with η: have bee said that Rc { η> ^ βη P Type II error P Accept H 0 H true P T 0 Rc H P η 0.3 H P η η η η 0.3 η η η H P T 0.3 η η η For the particular value η 0.35, β0.35 P T 0.3 0.35 P T.5 0.5 0.35 0.35 0 > porm-.5,0, [] 0.5 By usig a computer, may more values η 0.35 ca be cosidered to plot the power fuctio ϕη P Reject H 0 { αη if p Θ0 βη if p Θ # Sample ad iferece 0 alpha 0. theta0 0.5 # Value uder the ull hypothesis H0 c 0.3 theta seqfrom0.5,to,0.0 paramspace sortuiquectheta,theta0 PowerFuctio - pormc-paramspace/sqrtparamspace*-paramspace/,0, plotparamspace, PowerFuctio, xlab'theta', ylab'probability of rejectig theta0', mai'power Fuctio', type'l' This code geerates the power fuctio: 9 Solved Exercises ad Problems of Statistical Iferece

b Cofidece iterval Statistic: From a table of statistics e.g. i [T], the same statistic is selected T ; η d η η N 0,?? where the symbol? is substituted by the best iformatio available. I testig hypotheses we were also studyig the ukow quatity η, although it was provisioally supposed to be kow uder the hypotheses; for cofidece itervals, we are ot workig uder ay hypothesis ad η must be estimated i the deomiator: T ; η d η η N 0, η η The iterval is obtaied with the same calculatios as i previous exercises ivolvig a Beroulli populatio, [ ^ r α/ I α η η ^ η ^ η ^ η ^ ^ r α/, η ] where r α / is the value of the stadard ormal distributio such that P Z>r α/ α /. By usig 0. Sample proportio: 90% α 0.9 α 0. α/ 0.05 ^ η 3 0.83. 0 r 0.05l 0.95.65. the particular iterval for these data appears [ I 0.9 0.83.65 ] 0.83 0.83 0.83 0.83, 0.83.65 [ 0.5, 0.35] 0 0 Thikig about the iterval as a acceptace regio, sice η00.5 I the hypothesis that η may still be 0.5 is ot rejected. Coclusio: With cofidece 90%, the proportio of births by mothers of over 30 years of age seems to be 0.5 at most. The same decisio is still made by cosiderig the cofidece iterval that would correspod to 50 Solved Exercises ad Problems of Statistical Iferece

a two-sided odirectioal test with the same cofidece, that is, by allowig the ew proportio to be differet because it had severely icreased or decreased. Remember: statistical results deped o: the assumptios, the methods, the certaity ad the data. My otes: Exercise pe-ci-ht A radom quatity is supposed to follow a distributio whose probability fuctio is, for θ > 0, { θx f x ; θ θ 0 if 0 x otherwise A Apply the method of the momets to fid a estimator of the parameter θ. B Apply the maximum likelihood method to fid a estimator of the parameter θ. C Use the estimators obtaied to build others for the mea μ ad the variace σ. D Let,..., be a simple radom sample. By applyig the results ivolvig Neyma-Pearso's lemma ad the likelihood ratio, study the critical regio for the followig pairs of hypotheses. { H 0 : θ θ0 H : θ θ { H 0 : θ θ0 H : θ θ >θ 0 { { H 0 : θ θ0 H : θ θ <θ0 H 0 : θ θ0 H : θ θ >θ 0 { H 0 : θ θ0 H : θ θ <θ0 Hit: Use that E θ/θ ad E θ/θ. Discussio: This statemet is basically mathematical. The radom variable is dimesioless. This probability distributio, with stadard power fuctio desity, is a particular case of the Beta distributio. Note: If E had ot bee give i the statemet, it could have bee calculated by itegratig: E x f x ; θdx 0 x θ x θ θ Besides, E could have bee calculated as follows: E x f x ;θdx 0 x θ x θ θ [ ] [ ] x dxθ 0 x dx θ θ θ 0 θ dxθ 0 x θ xθ dxθ θ θ 0 θ Now, μe θ θ ad θ σ Var E E θ θ. θ θ θ θ A Method of the momets a Populatio ad sample momets: There is oly oe parameter oe equatio is eeded. The first-order momets of the model ad the sample x are, respectively, μ θ E θ ad m x, x,..., x j x j x θ a System of equatios: Sice the parameter of iterest θ appears i the first-order momet of, the first equatio suffices: θ x x θθ x x θ x μ θ m x, x,..., x θ j j x 5 Solved Exercises ad Problems of Statistical Iferece

a3 The estimator: θ^ M B Maximum likelihood method b Likelihood fuctio: For this probability distributio, the desity fuctio is f x ; θθ x θ so L x, x,..., x ; θ j f x j ; θ j θ xθj θ x j j θ b Optimizatio problem: The logarithm fuctio is applied to make calculatios easier log[ L x, x,..., x ; θ] logθθ log j x j To fid the local or relative extreme values, the ecessary coditio is: 0 d log [ Lx, x,..., x ; θ] θ log j x j θ log j x j θ0 dθ log j x j To verify that the oly cadidate is a local maximum, the sufficiet coditio is: d d log[ L x, x,..., x ; θ] [ log x j ] < 0 j θ dθ dθ θ The secod derivative is always egative, also at the value θ0. b3 The estimator: θ^ ML log j j C Estimatio of η ad σ c For the mea: By applyig the plug-i priciple, ^θ M From the method of the momets: μ^ M. θ^ M log j j θ^ ML. From the maximum likelihood method: μ^ ML ^θ ML log j j log j j c For the variace: Istead of substitutig i the large expressio of σ, we use fuctioal otatio From the method of the momets: σ^ M σ θ^ M, with σ θ ad θ^ M give above. From the maximum likelihood method: σ^ ML σ θ^ ML, with σ θ ad θ^ ML give above. D Neyma-Pearso's lemma ad likelihood ratio d For the hypotheses: { H 0 : θ θ0 H : θ θ 5 Solved Exercises ad Problems of Statistical Iferece

The likelihood fuctio ad the likelihood ratio are L ; θ θ j θ j ad L ; θ0 θ0 L ; θ θ Λ ; θ 0, θ j θ0 θ j The, the critical or rejectio regio is Rc { Λ < k } { { θ0 θ θ 0 θ log { θ 0 θ log j j j j θ0 θ } { < k log θ0 θ θ θ log } j < 0 { } { θ j < log k log θ0 θ log k log θ0 θ 0 θ log j j θ^ ML < θ0 θ j > j < log k } θ log k log θ 0 θ log k log θ 0 } } Now it is ecessary that θ θ 0 ad { { if θ <θ 0 the θ0 θ > 0 ad hece Rc θ^ ML < if θ >θ 0 the θ0 θ < 0 ad hece Rc θ^ ML > } } θ 0 θ θ0 log k log θ θ 0 θ log k log θ0 θ This suggests regios of the form Rc {Λ< k } { θ^ ML < c } or Rc {Λ<k } { θ^ ML >c } The form of the critical regio ca qualitatively be justified as follows: if θ < θ0, the hypothesis H will be accepted whe a estimator of θ is i the lower tail, ad vice versa. Hypothesis tests { H 0 : θ θ0 H : θ θ >θ 0 { H 0 : θ θ0 H : θ θ <θ 0 I applyig the methodologies, the same critical value c will be obtaied for ay θ sice it oly depeds upo θ0 through θ^ ML : αp Type I errorp θ^ ML < c or αp Type I errorp θ^ ML > c This implies that the uiformly most powerful test has bee foud. Hypothesis tests { H 0 : θ θ0 H : θ θ >θ 0 { H 0 : θ θ0 H : θ θ <θ 0 A uiformly most powerful test for H 0 : θ θ0 is also uiformly most powerful for H 0 : θ θ0. Coclusio: For the probability distributio determied by the fuctio give, two methods of poits estimatio have bee applied. I this case, the two methods provide differet estimators. By applyig the plug-i priciple, estimators of the mea ad the variace have also bee obtaied. The form of the critical regio has bee studied by applyig the Neyma-Pearso's lemma ad the likelihood ratio. 53 Solved Exercises ad Problems of Statistical Iferece

My otes: Additioal Exercises Exercise ae Assume that the height i cetimeters, cm of ay studet of a group follows a ormal distributio with variace 55cm. If a simple radom sample of 5 studets is cosidered, calculate the probability that the sample quasivariace will be bigger tha 6.65cm. Discussio: I this exercise, the suppositio that the ormal distributio reasoably explais the variable height should be evaluated by usig proper statistical techiques. Idetificatio of the variable ad selectio of the statistic : The variable is the height, the populatio distributio is ormal, the sample size is 5, ad we are asked for the probability of a evet expressed i terms of oe of the usual statistics: P S > 6.65. Search for a kow distributio: Sice we do ot kow the samplig distributio of S, we caot calculate this probability directly. Istead, just after readig 'sample quasivariace' we should thik about the followig theoretical result S 5 S T χ, or, i this case, T χ5, 55 cm σ Rewritig the evet: The evet has to be rewritte by completig some terms util the dimesioless statistic T appears. Additioally, whe the table of the χ distributio gives lower-tail probabilities P x, it is ecessary to cosider the complemetary evet: 5 S 5 6.65 cm P S > 6.65P > P T > 8. P T 8. 0.750.5. 55 cm 55 cm I these calculatios, oe property of the trasformatios has bee applied: multiplyig or dividig by a positive quatity does ot modify a iequality. Coclusio: The probability of the evet is 0.5. This meas that S will sometimes take a value bigger tha 6.65cm, whe evaluated at specific data x comig from the populatio distributio. My otes: Exercise ae Let be a radom variable with probability fuctio θ f x ; θ 5 θx, x [0,3] 3θ Solved Exercises ad Problems of Statistical Iferece

such that E 3θ/θ. Supposed a simple radom sample,...,, apply the method of the momets to fid a estimator θ M of the parameter θ. Discussio: This statemet is mathematical. Although it is give, the expectatio of could be calculated as follows 3 [ ] θ x θ x θ 3θ 3θ θ θ μ θ E x f x ; θdx 0 x θ dx θ 3 3 θ 0 3θ θ θ 3 Method of the momets Populatio ad sample cetered momets: The first-order momets are μ θ 3θ θ ad m x, x,..., x x x j j System of equatios: Sice the parameter θ appears i the first-order momet of, the first equatio is sufficiet to apply the method: 3θ x μ θ m x, x,..., x j x j x 3 θθ x x θ3 x x θ θ 3 x The estimator: θ^ M 3 My otes: Exercise 3ae A poll of 000 idividuals, beig a simple radom sample, over the age of 65 years was take to determie the percet of the populatio i this age group who had a Iteret coectio. It was foud that 387 of the 000 had oe. Fid a 95% cofidece iterval for η. Take from a exercise of Statistics, Spiegel ad Stephes, Mc-Graw Hill Discussio: Asymptotic results ca be applied for this large sample of a Beroulli populatio. The cutoff age value determies the populatio of the statistical aalysis, but it plays o other role. Both η ad η^ are dimesioless. Idetificatio of the variable: Havig the coectio or ot is a dichotomic situatio; the Coected a idividual? ~ Berη Pivot: We take ito accout that: There is oe Beroulli populatio The sample size is big, 000, so a asymptotic approximatio ca be applied A statistic is selected from a table e.g. i [T]: 55 Solved Exercises ad Problems of Statistical Iferece

T ; η d η η N 0, η η Evet rewritig: αp l α / T ; η r α / P r α / P r α / η ^ η r α / η ^ η ^ η η η η η η η η η r α / α/ rα / η P η r η η r α/ P η η η η η α/ η η r 3 The iterval: The, [ r α/ I α η η η η η r α /, η ] where r α / is the value of the stadard ormal distributio verifyig P Z> r α / α /. Substitutio: We eed to calculate the quatities ivolved i the previous formula, 000 Theoretical simple radom sample:,...,000 s.r.s. each value is or 0 Empirical sample: x,...,x000 000 j x j387 95% α 0.95 α 0.05 α/ 0.05 000 387 x j 0.387 j 000 000 r α/.96 ^ η Fially, [ I 0.95 0.387.96 ] 0.387 0.387 0.387 0.387, 0.387.96 [0.357, 0.7 ] 000 000 Coclusio: The ukow proportio of idividuals over the age of 65 years with Iteret coectio is iside the rage [0.357, 0.7] with a probability of 0.95, ad outside the iterval with a probability of 0.05. Perhaps a 0-to-00 scale facilitates the iterpretatio: the percet of idividuals is i [35.7%,.7%] with 95% cofidece. Proportios ad probabilities are always dimesioless quatities, though expressed i percet. My otes: Exercise ae A compay is iterested i studyig its cliets' behaviour. For this purpose, the mea time betwee cosecutive demads of service is modelized by a radom variable whose desity fuctio is: x f x ; θ θ e θ, x, θ>0 The estimator provided by the method of the momets is θ^ M. 56 Solved Exercises ad Problems of Statistical Iferece

st Is it a ubiased estimator of the parameter? Why? d Calculate its mea square error. Is it a cosistet estimator of the parameter? Why? Note: E θ ad Var θ Discussio: The two sectios are based o the calculatio of the mea ad the variace of the estimator give i the statemet. The, the formulas of the bias ad the mea square error must be used. Fially, the limit of the mea square error is studied. Mea ad variace of θ^ M E θ^ M E E E E θ θ Var Var θ^ M Var Var Var 0 σ θ Ubiasedess: The estimator is ubiased, as the expressio of the mea shows. Alteratively, we calculate the bias b θ^ M Eθ^ M θθ θ0 Mea square error ad cosistecy: The: MSE θ^ M b θ^ M Var θ^ M 0 θ θ The populatio variace θ does ot deped o the sample, particularly o the sample size. The, lim MSE θ^ M lim θ 0 Note: I geeral, the populatio variace ca be fiite or ifiite for some strage probability distributios we do ot cosider i ^ exists, i the sese that they are ifiite; i this subject. If the variace is ifiite, σ, either Var θ^ M ot MSE θ M this particular exercise it is fiite, θ <. I the former case, the mea square error would ot exist ad the cosistecy i probability could ot be studied by usig this way. I the latter case, the mea square error would exist ad ted to zero cosistecy i the mea-square sese, which is sufficiet for the estimator of θ to be cosistet i probability. Coclusio: The calculatios of the mea ad the variace are quite easy. They show that the estimator is ubiased ad, if the variace is fiite, cosistet. Advaced Theory: The If E had ot bee give i the statemet, it could have bee calculated by applyig itegratio by parts sice polyomials ad expoetials are fuctios of differet type : E x f x ;θdx [ x e That x θ x θ θ e ux has bee used with u ' v ' θ e x θ ] [ x θe ] θ. u x v ' x dxu x v x u ' x v x dx [ x x x x e θ dx x e θ e θ dx θ x θ x x v θ e θ dx e θ O the other had, ex chages faster tha xk for ay k. To calculate E: 57 Solved Exercises ad Problems of Statistical Iferece ]

E x f x ;θdx [ x e x θ ] θ ux u ' x x 3 v ' θ e θ ] x x θ e θ dx 0 θ μ θθθ θ. Agai, itegratio by parts has bee applied: [ x x x x θ e θ dx x e θ x e θ dx u x v ' x dxu x v x u ' x v x dx with x 3 x 3 v θ e θ dx e θ Agai, ex chages faster tha xk for ay k. Fially, the variace is σ E E θ θ θ θ θ θ θ θ. Regardig the origial probability distributio: i the expressio remids us the expoetial distributio; ii the term x suggests a traslatio; ad iii the variace θ is the same as the variace of the expoetial distributio. After traslatig all possible values x, the mea is also traslated but the variace is ot. Thus, the distributio of the statemet is a traslatio of the expoetial distributio, which has two equivalet otatios Equivaletly, whe θ λ, My otes: Exercise 5ae Is There Itelliget Life o Other Plaets? I a 997 Marist Istitute survey of 935 radomly selected Americas, 60% of the sample aswered yes to the questio Do you thik there is itelliget life o other plaets? http://maristpoll.marist.edu/tag/mipo/. Let's use this sample estimate to calculate a 90% cofidece iterval for the proportio of all Americas who believe there is itelliget life o other plaets. What are the margi of error ad the legth of the iterval? From Mid o Statistics. Utts, J.M., ad R.F. Heckard. Thomso LINGUISTIC NOTE From: Commo Errors i Eglish Usage. Brias, P. William, James & Co. America. May Caadias ad Lati Americas are uderstadably irritated whe U.S. citizes refer to themselves simply as Americas. Caadias ad oly Caadias use the term North America to iclude themselves i a two-member group with their eighbor to the south, though geographers usually iclude Mexico i North America. Whe addressig ad iteratioal audiece composed largely of people from the Americas, it is wise to cosider their sesitivities. However, it is poitless to try to ba this usage i all cotexts. Outside of the Americas, America is uiversally uderstood to refer to thigs relatig to the U.S. There is o good substitute. Brazilias, Argetieas, ad Caadias all have uique terms to refer to themselves. Noe of them refer routiely to themselves as Americas outside of cotexts like the Orgaizatio of America States. Frak Lloyd Wright promoted Usoia, but i ever caught o. For better or worse, America is stadard Eglish for citize or residet of the Uited States of America. LINGUISTIC NOTE From: Wikipedia. America word. The meaig of the word America i the Eglish laguage varies accordig to the historical, geographical, ad political cotext i which it is used. America is derived from America, a term origially deotig all of the New World also called the Americas. I some expressios, it retais this Pa-America sese, but its usage has evolved over time ad, for various historical reasos, the word came to deote people or thigs specifically from the Uited States of America. 58 Solved Exercises ad Problems of Statistical Iferece

I moder Eglish, Americas geerally refers to residets of the Uited States; amog ative Eglish speakers this usage is almost uiversal, with ay other use of the term requirig specificatio. [] However, this default use has bee the source of cotroversy, [][3] particularly amog Lati Americas, who feel that usig the term solely for the Uited States misappropriates it. They argue istead that "America" should deote persos or thigs from aywhere i North, Cetral or South America, ot just the Uited States, which is oly a part of North America. Discussio: There are several complemetary pieces of iformatio i the statemet that help us to idetify the distributio of the populatio variable Beroulli distributio ad select the proper statistic T: a The meaig of the questio for each item there are two possible values: yes or o. b The value 60% suggests that this is a proportio expressed i percet. c The words Let's use this sample estimate ad cofidece iterval for the proportio. Thus, we must costruct a cofidece iterval for the proportio η a percet is a proportio expressed i a 0-to-00 scale of oe Beroulli populatio. The sample iformatio available cosists of two data: the sample ^. The relatio betwee these quatities is the followig: size 935 ad the sample proportio η0.6 j i # ' s η ^ j i. # eses Although it is ot ecessary, we could calculate the umber of oes: #'s ^ # ' s935 0.656 0.6η 935 ^ Now, if we had ot realized that 0.6 was the sample proportio, we would do η 56 0.6. 935 Idetificatio of the variable: Aswered with yes oe America? ~ Bη Cofidece iterval For this kid of populatio ad amout of data, we use the statistic: T ; η d ^ η η N 0,?? ^ For cofidece itervals η is ukow ad o value is supposed, ad where? is substituted by η or η. hece it is estimated through the sample proportio. By applyig the method of the pivot: αp l α/ T ; η r α/ P r α / P r α/ η ^ η r α / η ^ η ^ η ^ η ^ η ^ η ^ η ^ η ^ η ^ η ^ ^ η r α / ^ α / ^ rα/ η P η r η η ^ r α/ P η η ^ η ^ η ^ η ^ ^ α/ η η r 59 Solved Exercises ad Problems of Statistical Iferece

The, the iterval is [ ^ r α/ I α η η ^ η ^ η ^ η ^ ^ r α /, η ] Substitutio: We calculate the quatities i the formula, 935 η0.6 ^ 90% α 0.90 α 0.0 α/ 0.05 r α /r 0.05l 0.95.65 So [ I 0.99 0.6.65 ] 0.6 0.6 0.6 0.6, 0.6.65 [0.57, 0.66 ] 935 935 Margi of error ad legth To calculate the previous edpoits we had calculated the margi of error, which is E r α/ η ^ η ^ 0.6 0.6.65 0.06 935 The legth is twice the margi of error L E 0.060.057 I geeral, eve if the T follows ad asymmetric distributio ad we do ot talk about margi of error, the legth ca always be calculated through the differece betwee the upper ad the lower edpoits: L0.66 0.570.05 Coclusio: Sice the populatio proportio is i the iterval 0, by defiitio, the values obtaied seem reasoable. Both edpoits are over 0.5, which meas that most US citizes thik there is itelliget life o other plaets. With a cofidece of 0.90, measured i a 0-to- scale, the value of η will be i the iterval obtaied. As regards the methodology applied, 90% times i average it provides a right iterval. Noetheless, frequetly we do ot kow the real η ad therefore we will ever kow if the method has failed or ot. My otes: Exercise 6ae It is desired to kow the proportio η of female studets at uiversity. To that ed, a simple radom sample of studets is to be gathered. Obtai the estimators η^ M ad η^ ML for that proportio, by applyig the method of the momets ad the maximum likelihood method. Discussio: This statemet is mathematical, really. Although it is give i the statemet, the expectatio of could be calculated as follows x x μ ηe Ω x f x ; θ x0 x η η 60 0 η η η Solved Exercises ad Problems of Statistical Iferece

Method of the momets Populatio ad sample cetered momets: The probability distributio has oe parameter. The first-order momets are ad μ ηe η m x, x,..., x j x j x System of equatios: Sice the parameter η appears i the first-order momet of, the first equatio is sufficiet to apply the method: μ ηm x, x,..., x η j x j x The estimator: η^ M Maximum likelihood method Likelihood fuctio: For the distributio the mass fuctio is f x ; ηη x η x. j x j L x, x,..., x ; η j f x j ; ηηx η x ηx η x η η j x j Optimizatio problem: The logarithm fuctio is applied to facilitate the calculatios, j x j log[ L x, x,..., x ; η]log [η j x j ] log [ η ] j x j log η j x jlog η. To fid the local or relative extreme values, the ecessary coditio is d 0 log [ L x, x,..., x ; η] j x j j x j dη η η j x j η j x j η η η j x j j x j η j x j η j x j η0 j x j x To verify that the oly cadidate is a local or relative maximum, the sufficiet coditio is j x j j x j d log [ L x, x,..., x ; η] x x <0 j j j j d η η η η η sice xj ad therefore j x j j x j 0. This holds for ay value, icludig η0. The estimator: η^ ML Coclusio: The two methods provide the same estimator. My otes: 6 Solved Exercises ad Problems of Statistical Iferece

Exercise 7ae A populatio variable is supposed to follow the cotiuous uiform distributio with parameters λ 0 ad λ. A simple radom sample of size is cosidered to estimate λ. Apply the method of the momets to build a estimator. Discussio: The distributio cosidered has two parameters, though oe of them is kow. Method of the momets Populatio ad sample momets: The first-order momets are 0 λ ad μ λ E m x, x,..., x j x j x System of equatios: Sice the parameter of iterest λ appears i the first-order populatio momet of, the first equatio is eough to apply the method: λ μ λ m x, x,..., x j x j x λ x The estimator: λ^ Coclusio: To estimate the parameter λ, the method of the momets suggests twice the sample mea. My otes: Exercise 8ae Plastic sheets produced by a machie are costatly moitored for possible fluctuatios i thickess measured i millimeters, mm. If the true variace i thickesses exceeds.5 square millimeters, there is cause for cocer about product quality. The productio process cotiues while the variace seems smaller tha the cutoff. Thickess measuremets for a simple radom sample of 0 sheets produced i a particular shift were take, givig the followig results: 6, 6, 7, 6, 5, 8, 5, 6, 9, 7 Test, at the 5% sigificace level, the hypothesis that the populatio variace is smaller tha.5mm. Suppose that thickess is ormally distributed. Calculate the type II error β, fid the geeral expressio of βσ ad plot the power fuctio. Based o a exercise of: Statistics for Busiess ad Ecoomics, Newbold, P., W.L. Carlso ad B.M. Thore, Pearso. Discussio: The suppositio of ormality should be evaluated. This statistical problem requires us to study the variace of a ormal populatio. Cocretely, the applicatio of a hypothesis test to see whether or ot the value cosidered as reasoable has bee exceeded. For large samples, we are give some quatities already calculated; here we are give the crude data from which we ca calculate ay quatity. The hypothesis is allocated at H for the productio process to cotiue oly whe high quality sheets are bee made ad for the equality to be i H0. 6 Solved Exercises ad Problems of Statistical Iferece

Statistic: Sice There is oe ormal populatio The variace must be studied The mea is kow the followig statistic will be used T ; σ s S χ σ σ As particular cases, whe doig calculatios uder ay hypothesis a value for σ is supposed: S T 0 χ σ0 We will eed the quatities S ad S T χ σ j 6 mm 6 mm 7 mm 6.5 mm j 0 j [ 6 mm 6.5 mm 7 mm 6.5 mm ].6 mm j 0 ad 0.6 mm T 0 x 6..5 mm Oe-sided directioal hypothesis test Hypotheses ad form of the critical regio: H 0 : σ σ 0.5 ad H : σ σ <.5. For these hypotheses, Rc {S < c } {T 0 <a } By applyig the defiitio of α: α.5 P Type I error P Reject H 0 H 0 true P T Rc H 0 P T 0 < a ar α3.33 Rc {T 0 <3.33 } Decisio: Fially, it is ecessary to check if this regio suggested by H0 is compatible with the value that the data provide for the statistic. If they are ot compatible because the value seems extreme whe the hypotheses is true, we will trust the data ad reject the hypothesis H0. Sice T 0 x6. > 3.33 T 0 x Rc H0 is ot rejected. The secod methodology is based o the calculatio of the p-value: pv P more rejectig tha x H 0 truep T 0 <T 0 x P T 0< 6.0.305 > pchisq6., 0- [] 0.307995 pv 0.305> 0.05α H0 is ot rejected. Type II error ad power fuctio: To calculate β, we have to work uder H, that is, with T. Sice the critical regio is expressed i terms of T0, the mathematical trick of multiplyig ad dividig by same quatity is applied: 63 Solved Exercises ad Problems of Statistical Iferece

βσ P Type II error P Accept H 0 H true P T 0 Rc H P T 0 3.33 H 3.33 σ 0 S S σ P 3.33 H P 3.33 H P T σ 0 σ σ0 σ For the particular value σ, β P T 3.33.5 P T 3.75 0.97 > - pchisq3.75, 0- [] 0.97083 By usig a computer, may other values σ ca be cosidered so as to umerically determie the power of the test curve βσ ad to plot the power fuctio. ϕσ P Reject H 0 { α σ if σ Θ0 βσ if σ Θ # Sample ad iferece 0 alpha 0.05 theta0.5 # Value uder the ull hypothesis H0 q qchisqalpha,- theta seqfrom0,to.5,0.0 paramspace sortuiquectheta,theta0 PowerFuctio pchisqq*theta0/paramspace, - plotparamspace, PowerFuctio, xlab'theta', ylab'probability of rejectig theta0', mai'power Fuctio', type'l' Coclusio: With a cofidece of 0.95, measured i a 0-to- scale, the real value of σ will be smaller tha.5mm, that is, the quality of the product will be appropriate. I average, the method we are applyig provides a right decisio 95% times; however, sice frequetly we do ot kow the true value of σ we ever kow whether the decisio is true or ot. My otes: Exercise 9ae If 3 of 00 male voters ad 90 of 59 female voters favor a certai catidate ruig for goveror of 6 Solved Exercises ad Problems of Statistical Iferece

Illiois, fid a 99% cofidece iterval for the differece betwee the actual proportios of male ad female voters who favor the cadidate. From: Mathematical Statistics with Applicatios. Miller, I., ad M. Miller. Pearso. Discussio: There are two idepedet Beroulli populatios whose proportios must be compared populatios will ot be idepedet if, for example, males ad females would have bee selected from the same couples or families. The value has bee used to cout the umber of voters who favor the cadidate. The method of the pivot will be used. Idetificatio of the variable: Favorig or ot is a dichotomic situatio, M Favorig the cadidate M ~ BerηM F Favorig the cadidate F ~ BerηF Pivot: We take ito accout that: There are two idepedet Beroulli populatios Both sample sizes are large, so a asymptotic approximatio ca be applied From a table of statistics e.g. i [T], the followig pivot is selected T M, F ; ηm, η F ^ M η ^ F ηm ηf η η ^ M η^ M η ^ η ^F F M F d N 0, Evet rewritig: αp l α/ T M, F ; ηm, ηf r α/ P r α/ P r α/ η^ M η^ F ηm ηf ^ M η^ F η^ F η^ M η M F r α / ^ M η^ M η ^ F η ^F ^ F η ^ F η η^ M η^ M η η^ M η^ F ηm ηf r α/ M F M F P η^ M η^ F r α / P η^ M η ^ F r α/ ηm ηf η^ M η^ F r α / ^ M η^ M η ^ η^ F ^ η^ M η ^ η ^F η η F ηm ηf η^ M η^ F r α/ M F M F M F 3 The iterval: [ I α η^ M η ^ F r α / ^ η ^F ^ M η^ M η ^ η ^F η^ M η^ M η η F, η ^ M η ^ F r α / F M F M F where r α / is the value of the stadard ormal distributio such that P Z>r α/ α /. Substitutio: We eed to calculate the quatities ivolved i the previous formula, M 00 ad F 59. Theoretical simple radom sample: M,...,M00 s.r.s. each value is or 0. 65 Solved Exercises ad Problems of Statistical Iferece ]

Empirical sample: m,...,m00 00 j m j3 η^ M 00 3 m j 0.66. j 00 00 Theoretical simple radom sample: F,...,F59 s.r.s. each value is or 0 Empirical sample: f,...,f59 59 j f j90 η^ F 99% α 0.99 α 0.0 α/ 0.005 59 90 f j 0.56. j 59 59 r α/.576. The, I 0.990.66 0.56.576 0.66 0.66 0.56 0.56 [ 0.03906, 0.70 ] 00 59 Coclusio: The case ηm ηf caot formally be excluded whe the decisio is made with 99% cofidece. Sice η 0,, ay reasoable estimator of η should provide values i this rage or close to it; but because of the atural ucertaity of the samplig process radomess ad variability, i this case the smallest edpoit of the iterval was 0.03906, which ca be iterpreted as beig 0. Whe a iterval of high cofidece is far from 0, the case η M ηf ca clearly be rejected. Fially, it is importat to otice that a cofidece iterval ca be used to make decisios about hypotheses o the parameter values. My otes: Exercise 0ae For two Beroulli populatios with the same parameter, prove that the pooled sample proportio is a ubiased estimator of the populatio proportio. For two ormal populatios, prove that the pooled sample variace is a ubiased estimator of the populatio variace. Discussio: It is ecessary to calculate the expectatio of the pooled sample proportio by usig its expressio ad the basic properties of the mea. Alteratively, the most geeral pooled sample variace ca be used. For Beroulli populatios, the mea ad the variace ca be writte as μη ad σ η η. Mea of η^ p : This estimator ca be used whe η ηη. O the other had, E η^ E η. E η^ p E η ^ η^ E η ^ E η ^ ] E η [ ^ p E η^ p ηη η0. The, the bias is b η Mea of S p : This estimator ca be used whe σ σσ. O the other had, E S σ. E S p E S S E S E S σ σ y y y The, the bias is b S p E S p σ σ σ 0. My otes: 66 Solved Exercises ad Problems of Statistical Iferece

Exercise ae A research worker wats to determie the average time it takes a mechaic to rotate the tires of a car, ad she wats to be able to assert with 95% cofidece that the mea of her sample is off by at most 0.50 miute. If she ca presume from past experiece that σ.6 miutes mi, how large a sample will she have to take? From Probability ad Statistics for Egieers. Johso, R. Pearso Pretice Hall. Discussio: I calculatig the miimum sample size, the oly case we cosider i our subject is that of oe ormal populatio with kow stadard deviatio. Thus, we ca suppose that this is the distributio of. Idetificatio of the variable: Time of oe rotatio ~ Nμ, σ.6mi Sample iformatio: Theoretical simple radom sample:,..., s.r.s. the time measuremet of rotatios will be cosidered Margi of error: We eed the expressio of the margi of error. If we do ot remember it, we ca apply the method of the pivot to take the expressio from the formula of the iterval. [ ] I α r α / σ, r α / σ If we remembered the expressio, we ca use it. Either way, the margi of error for oe ormal populatio with kow variace is: Er α / σ Sample size Method based o the cofidece iterval: We wat the margi of error E to be smaller or equal tha the give Eg,.6 mi 6.739.3 0 E g Er α/ σ E g r α / σ z α / σ.96 Eg 0.50 mi sice r α/ r 0.05 /r 0.05 l 0.975.96. The iequality does ot chage either whe multiplyig or dividig by positive quatities or squarig. Coclusio: At least 0 data are ecessary to guaratee that the margi of error is 0.50mi at most. Ay umber of data larger tha would guaratee ad go beyod the precisio desired. This margi ca be ^ will be thought of as the maximum error i probability, i the sese that the distace or error θ θ smaller that E with a probability of α 0.95, but larger with a probability of α 0.05. My otes: Exercise ae To estimate the average tree height of a forest, a simple radom sample with 0 elemets is cosidered, 67 Solved Exercises ad Problems of Statistical Iferece

yieldig x.70 u ad S 6.3 u where u deotes a uit of legth ad S is the sample quasivariace. If the populatio variable height is supposed to follow a ormal distributio, fid a 95 percet cofidece iterval. What is the margi of error? Discussio: I this exercise, the suppositio that the ormal distributio reasoably explais the variable height should be evaluated by usig proper statistical techiques. To build the iterval ad fid the margi of error, the method of the pivotal quatity will be applied. Pivot: From the iformatio i the statemet, we kow that: The variable follows a ormal distributio The populatio variace σ is ukow, so it must be estimated The sample size is 0 asymptotic results caot be cosidered To apply this method, we eed a statistic with kow distributio, easy to maage ad ivolvig μ. From a table of statistics e.g. i [T], we select T, μ μ S t where,,..., is a simple radom sample, S is the sample quasivariace ad tκ deotes the t distributio with κ degrees of freedom. Evet rewritig: The iterval is built as follows. αp l α/ T ;μ r α/ P r α/ P r α / μ S r α/ P r α / S S μ r α/ S S S S μ r α/ P r α/ μ r α/ 3 The iterval: [ I r α/ ] S S, r α/ r α/ α S Note: We have simplified the otatio, but it is importat to otice that the quatities rα/ ad S deped o the sample size. To use this geeral formula with the specific data we have, the quatiles of the t distributio with κ 0 9 degrees of freedom are ecessary 95% 0.95 α α 0.05 I the table of the t distributio, we must search the quatile provided for the probability p α/ 0.975 i a lower-tail probability table, or p α/ 0.05 i a upper-tail probability table; if a two-tailed table is used, the quatile give for p α 0.950 must be used. Whichever the table used, the quatile is.093. Fially, I 0 x r 0.05 / s u.70 u.093 6.3.70 u.97 u[.73 u, 7.67 u] 0 0 By applyig the defiitio of the margi of error, 68 Solved Exercises ad Problems of Statistical Iferece

Er α/ S u.093 6.3.97 u 0 Coclusio: With 95% cofidece we ca say that the mea tree height is i the iterval obtaied. The margi of error, which is expressed i the same uit of measure as the data, ca be thought of as the maximum distace whe the iterval cotais the true value from the real ukow mea ad the middle poit of the iterval, that is, the maximum error i probability. My otes: 69 Solved Exercises ad Problems of Statistical Iferece

Appedixes [Ap] Probability Theory Remark pt: The probability is a measure i a 0-to- scale of the chaces with which a evet ivolvig a radom quatity occurs; alteratively, it ca be iterpreted as the proportio of times it occurs whe may values of the radom quatity are cosidered repeatedly ad idepedetly. For example, a evet of cofidece α 0.95 ca be cosidered i two equivalet ways: i that a measure of its occurrig, withi [0,], takes the value 0.95; or ii that whe the experimet is idepedetly repeated may times, the evet will occur more or less 95% of the times. Oce values for radom quatities are determied, the evet will have occurred or ot, but o probability is ivolved ay more. Some Remiders Markov's Iequality. Chebyshev's Iequality. For ay real radom variable, ay real fuctio hx takig oegative values, ad ay real positive a > 0, Eh Ω h x dp {h <a } h x dp {h a } h xdp {h a } h x dp {h a} a dpa {h a } dpa Ph a The, the Markov's iequality is obtaied Ph a Eh a For discrete, the same proof ca be rewritte with sums or, alteratively, sums ca be see as a particular case of Riema-Stieltjes itegrals. For cotiuous, the measure dp ca be writte as fxdx for well-behaved distributios called absolutely cotiuous such that a desity fuctio fx exists. The followig cases have special iterest: Whe hx x μ, we have that: P μ a Whe hx x, we have that: P a E μ Var a a E a E μ r Whe h x x μ, we have that: P μ a a r Whe r hx x but itself is a oegative radom variable: P a E a The Chebyshev's iequality ca be obtaied as follows a proof similar to that above ca be writte too P μ ap μ a E μ Var a a The positive brach of the square root is a strictly icreasig fuctio ad the evets i the two probabilities are the same. A similar iequality ca be obtaied with r istead of. We ca make it akσ to calculate the probability that takes values farther from μ tha k times σ. For example, 70 Solved Exercises ad Problems of Statistical Iferece

a σ P{ μ σ } σ 0.5 σ a3 σ P{ μ 3 σ } σ 0. 9σ 9 Iterpretatio of the first case: the probability that takes a value farther from the mea μ tha twice the stadard deviatio σ is 0.5 at most. All these iequalities are true whichever the probability distributio of, ad the proof above is based o bidig i a rough way. They are oparametric or distributio-free iequalities. As a cosequece, it seems reasoable to expect that there will be more powerful iequalities either whe additioal or stroger oparametric results are used or whe a parametric approach is cosidered for example, i calculatig the miimum sample size ecessary to guaratee a give precisio, we ca also apply methods usig statistics T based o asymptotic or parametric results. Geeratig Fuctios. This sectio has bee extracted from Probability ad Radom Processes. Grimmett, G., ad D. Stirzaker. Oxford Uiversity Press, 3rd ed. I Probability, geeratig fuctios are useful tools to work with e.g. whe covolutios or sums of idepedet variables are cosidered. Let a a0, a, a,... be a sequece. The simplest oe is the ordiary geeratig fuctio of a, defied Ga t i0 ai t i, t ℝ for which the sum coverges Gaj 0. This The sequece may i priciple be recostructed from the fuctio by settig a j j! fuctio is especially useful whe ai are probabilities. The expoetial geeratig fuctio of a is Ga t j0 a jt j, t ℝ for which the sum coverges j! O the other had, the probability geeratig fuctio of a radom variable takig oegative iteger values is defied as GtEt, t ℝ for which there is covergece Some authors give a defiitio for z ℂ, ad the radius of covergece is oe at least There are two major applicatios of probability geeratig fuctios: i calculatig momets, ad i calculatig the distributios of sums of idepedet variables. Theorem: k E G ', ad, more geerally, E k G. Of course, Gk is shorthad for lims wheever the radius of covergece of G is. Particularly, to calculate the first two raw momets: E G E E E G E G E G G If you are more iterested i the momets of tha i its mass fuctio, you may prefer to work ot with G but with the fuctio M called momet geeratig fuctio ad defied by t t M tg e E e, t ℝ for which there is covergece It is, uder covergece, the expoetial geeratig fuctio of the momets Ek. It holds that Theorem: k k E M ' 0, ad, more geerally, E M 0. Particularly, to calculate the first two raw momets, 7 Solved Exercises ad Problems of Statistical Iferece

E M 0 E M 0 Momet geeratig fuctios provide a very useful techique but suffter the disadvatage that the itegrals which defie them may ot always be fiite. Rather tha explore their properties i detail we move o immediately to aother class of fuctios that are equally useful ad whose fiiteess is guarateed. The characteristic fuctio is defied by it φ tee, t ℝ, i First ad foremost, from the kowledge of φ we ca recapture the distributio of. The characteristic fuctio of a distributio is closely related to its momet geeratig fuctio. Momet geeratig fuctios are related to Laplace trasforms while characteristic fuctios are related to Fourier trasforms. Theorem: a If φ k 0 exists the { k E < if k is eve k E < if k is odd k φ 0 k k k k k b If E < the φ 0i E, so E k. i The, to calculate the first two crude momets, E φ 0 i φ 0 E i Summary of results for calculatig the crude momets Ek. Geeratig Fuctios to Calculate Raw Momets Defiitio Theorem Probability Geeratig Fuctio discrete GtEt, t ℝ E k G Momet Geeratig Fuctio M t Ee t, t ℝ E M 0 Characteristic Fuctio φ teeit, t ℝ, i k k E k k φk 0 ik Existece: Techiques for series ad itegrals must be used to determie the values of t ℝ that guaratee the covergece ad hece the existece of the geeratig fuctio. Whe possible, we drop the subidex of the fuctios to simplify the otatio. The reader ca cosult the literature o Probability to see whether it is allowed to differetiate iside the series or the itegrals, which is equivalet to differetiate iside the expectatio. O the other had, there are other geeratig fuctios i literature: joit probability geeratig fuctio, joit characteristic fuctio, cumulat geeratig fuctio, et cetera. 7 Solved Exercises ad Problems of Statistical Iferece

Exercise pt I the followig cases, calculate the probability of fid the quatile: a Pois.7, P < 3 c UifDisc 6, P {, 5 } e N μ,σ, g t 7, b Bi, 0.3, P d UifCot, 5, P >. P > a0. f χ6, P a 0.05 h F0, 8, i F 5, 6, P > a 0.0 j t, P 3.5 P > 5.8 P {.356 } { > 3.055 } Discussio: Several distributios, discrete ad cotiuous, are ivolved i this exercise. Differet ways ca be cosidered to fid the aswers: the probability fuctio fx, the probability tables or a statistical software program. Sometimes evets eed to be rewritte or decomposed. For discrete distributios, tables ca cotai either idividual {x} or cumulative { x} or {>x} probabilities; for cotiuous distributios, oly cumulative probabilities. a The parameter value is λ.7, ad for the Poisso distributio the possible values are always 0,,... If the table provides cumulative probabilities of the form P x, P <3P P 0 If the table provides idividual probabilities, P <3P P 0.850.500.65 By usig the mass fuctio, P <3P P.7.7.7.7 e e 0.859 0.960.69!! Fially, by usig the statistical software program R, whose fuctio gives cumulative probabilities, > ppois,.7 - ppois0,.7 [] 0.69 To plot the probability fuctio values seq0, 0 probabilities dpoisvalues,.7 plotvalues, probabilities, type"h", lwd, ylimc0,, xlab"value", ylab"probability", mai"pois.7" b The parameter values are κ ad η 0.3, so the possible values are 0,,,...,. If the table of the biomial distributio gives idividual probabilities P x, P P 0 P P 0.098 0.0930.9980.38 If cumulative probabilities were give i the table, the probability P would be provided directly. O the other had, the mass fuctio ca be used too, 0 0 P P 0 P P 0.3 0.3 0.3 0.3 0.3 0.3 0 73 Solved Exercises ad Problems of Statistical Iferece

!!! 0 9 0 0 9 0.7 0.3 0.7 0.3 0.7 0.7 0.3 0.7 0.3 0.7 0! 0!!!!! 0 0 9 0.7 0.3 0.7 0.3 0.7 0.0977370.093683 0.997500.3705 Fially, by usig the statistical software program R, whose fuctio gives cumulative probabilities, > pbiom,, 0.3 [] 0.3705 To plot the probability fuctio values seq0, probabilities dbiomvalues,, 0.3 plotvalues, probabilities, type"h", lwd, ylimc0,, xlab"value", ylab"probability", mai"bi, 0.3" c The parameter value is κ 6, so the possible values are 0,,,..., 6. This probability distributio is so simple that o table is eeded. Sice the evet ca be decomposed ito two disjoit elemetary outcomes, P {, 5}P P 5 6 6 6 3 To plot the probability fuctio values seq, 6 probabilities rep/6, legthvalues plotvalues, probabilities, type"h", lwd, ylimc0,, xlab"value", ylab"probability", mai"uifdisc6" d The parameter values are κ ad κ 5, so the possible values are the real umbers i the iterval [,5] or with ope edpoits, depedig o the defiitio for the uiform distributio that you are cosiderig. No table is ecessary for this distributio, ad if we realize that 3.5 is the middle value betwee ad 5 o calculatio is eeded either, P 3.50.5 If ot, we ca use the desity fuctio, 5 P 3.5 3.5.5 dx 5 3.5 0.5 5 3 3 To plot the probability fuctio values seq, 5 probabilities rep/5-, legthvalues plotvalues, probabilities, type"l", lwd, ylimc0,, xlab"value", ylab"probability", mai"uifcot, 5" For cotiuous distributios, the probability of ay isolated value is zero, so P 3.5P >3.5. e Here the parameter values are μ ad σ, ad the value of a ormally distributed radom variable ca always be ay real umber. Because of the stadardizatio, a table with probabilities ad quatiles for the stadard ormal distributio suffices. I usig this table, we must pay attetio to the form of the evets whose probabilities are provided 7 Solved Exercises ad Problems of Statistical Iferece

P >. P μ. μ. > P Z> P Z >.7 P Z <.70.955 σ σ Writig the evet i terms of.7 is ecessary whe the table cotais oly positive quatiles. The stadardizatio ca be applied before or after cosiderig the complemetary evet. If we try solvig the itegral, P >.. f xdx. e π σ x μ σ dx? we are ot able to fid a atiderivative of fx... because it does ot exist. The, we may remember that the atiderivative of e x does ot exist ad that the defiite itegral of fx ca be solved exactly oly for some limits of itegratio but it ca always be solved umerically. O the other had, by usig the statistical software program R, whose fuctio cotais cumulative probabilities for evets of the form {<x}, > - porm-., -, sqrt [] 0.95535 To plot the probability fuctio values seq-0, 0, legth00 probabilities dormvalues, -, plotvalues, probabilities, type"l", lwd, ylimc0,, xlab"value", ylab"probability", mai"n-, sd" f The parameter value is κ 6. The set of possible values is always composed of all positive real umbers. Most tables of the chi-square distributio provide the probability of evets of the form P>x. I this case, it is ecessary to cosider the complemetary evet before lookig for the quatile: P a0.05 P >a 0.050.975 a 6.9 We do ot use the desity fuctio, as it is too complex. By usig the statistical software program R, whose fuctio gives quatiles for evets of the form {<x}, > qchisq0.05, 6 [] 6.90766 To plot the probability fuctio values seq0, 0, legth00 probabilities dchisqvalues, 6 plotvalues, probabilities, type"l", lwd, ylimc0,, xlab"value", ylab"probability", mai"chi-sq6" g Now the parameter value is κ 7. A variable ejoyig the t distributio ca take ay real value. Most tables of this distributio provide the probability of evets of the form P>x. I this case, it is ot ecessary to rewrite the evet: P > a0. a.3 The desity fuctio is too complex to be used. The statistical software program R allows doig the fuctio provides quatiles for evets of the form {<x}, 75 Solved Exercises ad Problems of Statistical Iferece

> qt-0., 7 [].33703 To plot the probability fuctio values seq-5, 5, legth00 probabilities dtvalues, 7 plotvalues, probabilities, type"l", lwd, ylimc0,, xlab"value", ylab"probability", mai"t7" h The parameter values for this F distributio are κ 0 ad κ 8. The possible values are always all positive real umbers. Agai, most tables of this distributio provide the probability for evets of the form {>x}, so: P >5.80.0 The desity fuctio is also complex. Fially, by usig the computer, > - pf5.8, 0, 8 [] 0.00036 To plot the probability fuctio values seq0, 0, legth00 probabilities dfvalues, 0, 8 plotvalues, probabilities, type"l", lwd, ylimc0,, xlab"value", ylab"probability", mai"f0,8" i Now, the parameter values are κ 5 ad κ 6. The: P > a0.0 a 7.56 The desity fuctio is also complex. By agai usig the computer, > qf-0.0, 5, 6 [] 7.55899 To plot the probability fuctio values seq0, 0, legth00 probabilities dfvalues, 5, 6 plotvalues, probabilities, type"l", lwd, ylimc0,, xlab"value", ylab"probability", mai"f5, 6" j Sice the parameter value is κ, after decomposig the evet ito two disjoit tails P {.356 } { >3.055 }P {.356 }P { >3.055 } P { >.356}P { >3.055 } 0.0.0050.905 The desity fuctio is also complex. Fially, 76 Solved Exercises ad Problems of Statistical Iferece

> pt.356, - pt3.055, [] 0.9096 To plot the probability fuctio values seq-0, 0, legth00 probabilities dtvalues, plotvalues, probabilities, type"l", lwd, ylimc0,, xlab"value", ylab"probability", mai"t" My otes: Exercise pt Weekly maiteace costs measured i dollars, $ for a certai factory, recorded over a log period of time ad adjusted for iflatio, ted to have a approximately ormal distributio with a average of $0 ad a stadard deviatio of $30. If $50 is budgeted for ext week, what is a approximate probability that this budgeted figure will be exceeded? Take from Mathematical Statistics with Applicatios. W. Medehall, D.D. Wackerly ad R.L. Scheaffer. Duxbury Press Discussio: We eed to extract the mathematical iformatio from the statemet. There is a quatity, the weekly maiteace costs, say C, that ca be assumed to follow the distributio C N μ$ 0, σ$ 30 or, i terms of the variace, C N μ$ 0, σ $ 30$ 900 I practice, this suppositio should be evaluated. We are asked for the probability P C > 50. Sice C does ot follow a stadard ormal distributio, we stadardize both sides of the iequality, by usig μe C $ 0 ad σ Var C $ 30, to be able to use the table of the stadard ormal distributio: P C > 50 P 50 μ $ 50 $ 0 30 > P T> P T > C μ $ 30 30 σ σ P T > P T 0.83 0.587 My otes: Exercise 3pt * Fid the first two raw or crude momets of a radom variable whe it ejoys: 3 5 6 The Beroulli distributio The biomial distributio The geometric distributio The Poisso distributio The expoetial distributio The ormal distributio 77 Solved Exercises ad Problems of Statistical Iferece

Use the followig cocepts to do the calculatios i several ways: i their defiitio; ii the probability geeratig fuctio; iii the momet geeratig fuctio; iv the characteristic fuctio; or v others. The, fid the mea ad the variace of. Discussio: Differet methods ca be applied to calculate the first two momets. We have practiced as may of them as possible, both to lear as much as possible ad to compare their difficulty; besides, some of them are more powerful that others. Some of these calculatios are advaced. To work with characteristic fuctios, the defiitios ad rules of the aalysis for complex fuctios of a real variable must be cosidered, ad eve some calculatios may be easier if we work with the theory for complex fuctios of a complex variable. Most of these defiitios ad rules are atural geeralizatios of those of real aalysis, but we must be careful ot to apply them without the ecessary justificatio. The Beroulli distributio By applyig the defiitios E x0 x ηx η x 0 η η η E x0 x ηx η x 0 η0 η η η By usig the probability geeratig fuctio GtEt x0 t x η x η x t 0 η t η η ηt This fuctio exists for ay t. Now, the usual defiitios ad rules of the mathematical aalysis for real fuctios of a real variable imply that E G [ η ]tη E G E [ 0 ]t ηη By usig the momet geeratig fuctio t tx x x M t Ee x0 e η η t 0 t e η e η ηηe t This fuctio exists for ay real t. Because of the mathematical real aalysis, E M 0[ ηe t ]t 0η E M 0 [ ηe t ]t 0η By usig the characteristic fuctio it itx x x φ te e x0 e η η i t 0 i t it e ηe η η ηe This complex fuctio exists for ay real t. Complex aalysis is cosidered to do, E φ 0 [ ηe it i ]t 0 ηi η i i i φ 0 [ ηe it i ]t0 ηi E η i i i 78 Solved Exercises ad Problems of Statistical Iferece

Mea ad variace μe η σ Var E E η ηη η The biomial distributio By applyig the defiitios κ E x0 x κ η x ηκ x? x κ E x0 x κ ηx ηκ x? x κ A possible way cosists i writig as the sum of κ idepedet Beroulli variables: j0 j. κ κ E E j0 j j0 E j κ η E E [ κ j0 j ]? This way ca also be used to calculate the variace easily, but ot to calculate the secod momet: κ κ σ Var Var x0 i x0 Var i κ η η. By usig the probability geeratig fuctio κ κ ηt x ηt GtEt x0 t x κ ηx ηκ x ηκ x0 κ η κ x x η η [ η ηt η κ κ ] η ηt κ where the biomial theorem see the appedixes of Mathematics has bee applied. Alteratively, this fuctio ca also be calculated by lookig at as a sum of Beroulli variables j ad applyig a property for probability geeratig fuctios of a sum of idepedet radom variables, κ Gt[ G t ] ηηt κ This fuctio exists for ay t. Agai, complex aalysis allows us to do E G [ κ ηηt κ η]tκ κ ηκ η E G E [ κκ η ηt κ η ]t κ ηκκ η κ ηκ ηκ η η By usig the momet geeratig fuctio x κ κ ηe t ηe t M tee t x0 e tx κ η x η κ x ηκ x0 κ ηκ x x η η [ η ηe t η κ κ ] η ηe t κ Agai, it is also possible to look at as a sum of Beroulli variables j ad apply a property for momet 79 Solved Exercises ad Problems of Statistical Iferece

geeratig fuctios of a sum of idepedet radom variables, κ M t[ M t ] η ηe t κ This fuctio exists for ay real t. Because of the mathematical real aalysis, [ E M 0 κ η ηe t κ ] ηe t t 0κ η [ t κ E M 0 κ κ η ηe ηe t κ ηηe t κ ηe t ] t0 κκ η κ ηκ ηκ η η By usig the characteristic fuctio κ [ κ itx ηeit η η x it it κ ηx ηκ x ηκ κ κ ηe ηκ ηe x0 x x η η φ te e x0 e it ] κ η ηe it κ Oce more, by lookig at as a sum of Beroulli variables j ad applyig a property for characteristic fuctios of a sum of idepedet radom variables, κ φ t[ φ t ] η ηeit κ This complex fuctio exists for ay real t. Agai, complex aalysis is cosidered i doig, φ 0 [ κ ηηe E i i it κ ηeit i ]t0 [ it φ 0 κ κ ηηe E i [ κκ η ηe it κ it κ κ ηi κ η i κ ηeit i κ η ηe it ] ηeit i t0 i ηe i κ ηηe i it κ it ηe i ] t 0 [ κκ η i κ ηi ]t0 i [ κ κ η i κ ηi ]t0 κ ηκ η η i κ η κ η η i i Mea ad variace μe κ η σ Var E E η ηκ ηκ η η κ ηκ η η 3 The geometric distributio By applyig the defiitios E x x η η x? E x x η ηx? As a example, I iclude a way to calculate E that I foud. To prove that ay momet of order r is fiite or, equivaletly, that the series is absolutely coverget, we apply the ratio test for oegative series: 80 Solved Exercises ad Problems of Statistical Iferece

a x x r η ηx x r lim x lim η η < x ax x r η ηx xr lim x Mathematically, the radius of covergece is, that is, < η < < η < 0 > η> 0 Probabilistically, the meaig of the variable η i the geometric distributio, it is a probability betwee 0 ad r x implies that the series is coverget for ay η. Either way, this implies that < x x η η. Oce the covergece has bee proved, the rules of the usual arithmetic for fiite quatities ca be applied. The covergece of the series is crucial for the followig calculatios. E x x η η x η x0 x η x η [ η η ] η η [ x0 ηx x ηx ]η[ x0 ηx η x0 ηx ] [ ] x0 ηx [ η ]η [ ] x0 ηx η η η η. η where the formula of the geometric sequece see the appedixes of Mathematics has bee used. Alteratively, μ ca be calculated by applyig the formula available i literature for arithmetico-geometric series. By usig the probability geeratig fuctio x x GtEt x t η η x ηt x0 [t η] ηt ηt Give η, this fuctio exists for t such that t η < otherwise, the series does ot coverge, as the followig criterio shows x a x t η lim x lim x t η <. x ax t η The defiitios ad rules of the mathematical aalysis for real fuctios of a real variable, E G [ η[ η t ] ηt [ η] [ η t ] E G E [ ] [ ] [ t η [ ηt ] η [ ηt ] t η [ ηt ] η ] t η η [ ηt ]3 η η η ] η η η η η η t By usig the momet geeratig fuctio t tx x M tee x e η η t ηe x0 [e t η]x ηe t ηet This fuctio exists for ay real t such that et η < otherwise, the series does ot coverge, as the followig criterio shows x a x et η lim x lim x t e t η <. x ax e η Because of the mathematical real aalysis, [ E M 0 ηe t [ ηe t ] ηe t [ ηet ] [ ηe t ] 8 ] [ t0 ηet [ ηe t ηe t ] [ ηet ] Solved Exercises ad Problems of Statistical Iferece ] t 0

[ ηe t [ η et ] ] t 0 η η η [ ηe t [ ηe t ] ηe t [ ηet ][ ηe t ] E M 0 [ ηe t ] [ t t t ηe [ η e ηe ] [ ηe t ]3 ] [ t 0 t t ηe [ ηe ] [ η e t ]3 ] t 0 ] t 0 η η η η3 η By usig the characteristic fuctio φ te eit x eitx η η x ηe it x0 [e it η]x it ηe ηe it This complex fuctio exists for ay real t such that eit η <, where z deotes the modulus of a complex umber z otherwise, the series does ot coverge, as the followig criterio shows x a x eit η lim x lim x it e it η <. x ax e η Oce more, complex aalysis allows us to do, E [ φ 0 ηe it i[ ηe it ] ηe it [ η e it i] i i [ ηe it ] [ it it it ηe i[ η e η e ] i [ η eit ] ] [ t0 ] t 0 it ηe i i [ η eit ] ] t 0 ηi i η η [ φ 0 ηeit i [ η eit ] ηe it i [ η e it ][ ηe it i] E i i [ η e it ] [ it it it ηe i [ ηe ηe ] i [ ηeit ]3 ] [ it it ηe i [ ηe ] i [ ηeit ] 3 t0 ] ] t 0 ηi η η i η3 η t0 Mea ad variace μe η σ Var E E η η η η η Advaced theory: Additioal way : I Cálculo de probabilidades I, by Vélez, R., ad V. Herádez, UNED, the first four momets are calculated as follows I write the calculatios for the first two momets, with the otatio we are usig E x x η η x η x x η x η η d d η x η x η [ η] η η η [ η] η E x x η ηx η x x x η x η x x η x 8 Solved Exercises ad Problems of Statistical Iferece η d d η η

d η d η x x η η d E η E d η η η η[ η] η d E d η [ η] η η η η η η d d E η E d η d η [ η] [ η] η [ η][ η] [ η η][ η] η [ η] [ η] [ η η ] η η [ η]3 η η η η η η η η η η η η η η We have already justified the covergece of the series ivolved. Additioal way : I tryig to fid a way based o calculatig the mai part of the series by usig a ordiary differetial equatio, as I had previously doe for the Poisso distributio i the ext sectio, I foud the followig way that is essetially the same as the additioal way above. A series ca be differetiated ad itegrated term by term iside the circle of covergece the radius of covergece was oe, which icluded all possible values for η. The expressio of the mea suggests the followig defiitio for gη: E x x η η x g η x x η η g η x ad it follows, sice g is a well-behaved fuctio of η, that G η g η d η x x ηx d η x η x c η η c c η η I spet some time searchig a differetial equatio... ad I foud this itegral oe. Now, by solvig it, g ηg' η η η 0 η η This is a geeral method to calculate some ifiite series. Fially, the mea is E η g ηη η η For the secod momet, we defie E x x η ηx η g η g η x x η x ad it follows that G η g η d η x x x ηx d η x x ηx c η η η x x η ηx cc η Now, by solvig this trivial itegral equatio, g ηg' η0 η η η η η η η 3 η η η Fially, the secod momet is E η g ηη 83 η η 3 η η Solved Exercises ad Problems of Statistical Iferece

Remark: Workig with the whole series of μη or ση, as fuctios of η, is more difficult tha workig with the previous fuctios gη, sice the variable η would appear twice istead of oce I spet some time util I realize it. The Poisso distributio By applyig the defiitios x E x0 x λ e λ? x! x E x0 x λ e λ? x! To prove that ay momet of order r is fiite or, equivaletly, that the series is coverget, we apply the ratio test for oegative series: x x r λ e λ a x x! xr λ lim x lim x lim 0 < x x r ax x r λ λ x x e x! x x r λ λ λ r This implies that > x0 x e e x0 x λ. Oce the absolute covergece has bee proved, x! x! the rules of the usual arithmetic for fiite quatities could be applied. Nevertheless, workig with factorial umbers i series makes it easy to prove the covergece but difficult to fid the value. By usig the probability geeratig fuctio x x t λ GtEt x0 t x λ e λ e λ x0 e λ e t λ e λt x! x! This fuctio exists for ay t, as the followig criterio shows x lim x a x lim x ax t λ t λ x! lim x 0 <. x x t λ x! Now, the defiitios ad rules of the mathematical aalysis for real fuctios of a real variable, E G [ e λt λ ]t λ E G E [ e λt λ ]t E λ λ By usig the momet geeratig fuctio t x x e λ M tee x0 e λ e λ e λ x0 e λ e e λ e λe x! x! t tx t t This fuctio exists for ay real t, as the followig criterio shows lim x 8 a x lim x ax e t λ x e t λ x! lim 0 <. x x e t λx x! Solved Exercises ad Problems of Statistical Iferece

Because of the mathematical real aalysis, E M 0[ e λe λ e t ]t 0λ t E M 0 [ e λe λ et e λe λ e t ]t 0[ eλ e λ et λ e t ]t 0λ λ λ λ t t t By usig the characteristic fuctio x φ te eit x0 eitx λ e λe λ x0 x! eit λ x λ e λ λ e e e e x! it it This fuctio exists for ay real t, as the followig criterio shows lim x a x lim x ax eit λx e it λ x! lim 0 <. x x e it λ x x! The defiitios ad rules of the aalysis for complex fuctios have bee applied i the previous calculatios they are similar to those for real fuctios of real variable. Now, by usig the aalysis for complex fuctios of oe real variable, φ 0 [ e E i it λe φ 0 [ e E i λ e it i ]t0 λ i η i i it λe λ e it i eλ e λ e it i ]t 0 it i [ eλ e λ e it i λ eit ]t 0 it i λ i λ λ λ i Mea ad variace μe λ σ Var E E λ λ λλ Advaced theory: Additioal way : I fidig ways, I foud the followig oe. A series ca be differetiated ad itegrated term by term iside its circle of covergece. The limit calculated at the begiig was the same for ay λ, so the radius of covergece for λ is ifiite whe the series is looked at as a fuctio of λ. The expressio of the mea suggests the followig defiitio for gλ: x E x0 x λ e λ e λ g λ x! ad it follows, sice g is a well-behaved fuctio of λ, that x x0 x λx! g λ x x x x λ x λ g 'λ x x x x λ x λ x x λ e g λ x! x! x! x! λ Now, we solve the first-order ordiary differetial equatio g ' λ g λe. Homogeeous equatio: dg g dgd λ loggλk gh λe λk c eλ dλ g g 'λ gλ0 Particular solutio: We apply, for example, the method of variatio of parameters or costats. Substitutig i λ λ λ the equatio g λc λe ad g ' λc ' λ e c λe λ λ λ c ' λ e c λe c λe e 85 λ λ c ' λ c λ λ g p λλ e Solved Exercises ad Problems of Statistical Iferece

g λgh λ g p λ c e λ λ e λ c λ e λ Geeral solutio: Ay gλ give by the previous expressio verifies the differetial equatio, so a additioal coditio is ecessary to determie the value of c. The iitial defiitio implies that g0 0, so c 0. Fially, the mea is λ λ λ E e g λe λ e λ The same ca be doe to calculate some ifiite series. For the secod momet, we defie x E x0 x λ e λ e λ g λ x! ad it follows, sice g is a well-behaved fuctio of λ, that x x0 x λx! g λ x x x xλ g ' λ x x x x λ x [x x ] λ x! x! x! x x x x λ x x λ x x λ eλ g λ e λ λ x! x! x! The expressio of the expectatio of has bee used i the last term. Thus, the fuctio we are lookig for λ verifies the first-order ordiary differetial equatio g 'λ gλ e λ. Homogeeous equatio: This equatio is the same, so gh λe λk c eλ Particular solutio: By applyig the same method, c ' λ e c λe cλ e e λ c ' λ λ c λ λ λ g p λλ λ eλ λ λ λ λ g λgh λ g p λ c e λ λ λ e λ c λ λ e λ Geeral solutio: Ay gλ give by the previous expressio verifies the differetial equatio, so a additioal coditio is ecessary to determie the value of c. The defiitio above implies that g0 0, so c 0. Fially, the secod momet is E e λ gλe λ λλ e λ λλ Remark: Workig with the whole series of μλ or σλ as fuctios of λ is more difficult tha workig with the previous fuctios gλ, sice the variable λ would appear twice istead of oce. Additioal way : Aother way cosists i usig a relatio ivolvig the Stirlig polyomials see, e.g.,.69 of Aálisis combiatorio: problemas y ejercicios. Ríbikov et al. Mir j0 j xj x e P x j! P 0 x, P x x, P x x x,..., P x x j0 P j x j I this case, x E e λ x0 x λ e λ e λ P λλ. x! λ E e x x0 x λx! e λ eλ P λ λ λ. 5 The expoetial distributio By applyig the defiitios E 0 x λ e λ x Where the formula dx[ x e ] 0 e λ x 0 λ x dx[ x e u x v ' x dxu x v x u ' x v x dx 86 0 [ ] λ x ] [e λ x ] e 0 x λ λ λ x 0 0 λ λ of itegratio by parts has bee applied Solved Exercises ad Problems of Statistical Iferece

with x ad λe λx as iitial fuctios sice these two fuctios are of differet type. ux u ' λ x λ x λ x v λ e dx e v ' λ e For the secod-order momet, λ x E 0 x λ e λ x 0 λ x ] 0 x e dx[ x e dx0 λ 0 λ x xλe dx λ μ λ Where the formula u x v ' x dxu x v x u ' x v x dx of itegratio by parts has bee applied with x ad λe λx as iitial fuctios sice these two fuctios are of differet type. ux u ' x λ x λ x v ' λ e λ x v λ e dx e That the fuctio ex chages faster tha xk, for ay k, has bee used too i calculatig both itegrals. O the other had, for the expoetial λ > 0, so the previous itegrals always coverge. By usig the momet geeratig fuctio M tee t 0 etx λ e λ x dxλ 0 e x[ t λ ] dx λ [e x [t λ]] 0 λ t λ λ t This fuctio exists for real t such that t λ < 0 otherwise, the itegral does ot coverge. Because of the mathematical real aalysis, [ E M 0 λ λ t E M 0 [ ] λ λ λ t 0 λ λ t λ t ] [ t 0 λ λ t3 ] t 0 λ By usig the characteristic fuctio φ te eit 0 eitx λ e λ x dxλ 0 e x i t λ dxλ lim M {Z γ, 0 γ M } e z i t λ dz λ lim M [ e z i t λ it λ ] λ lim M [ e M it λ ] λ lim M [ e M λ e i M t ] λ i t λ λ it { Zγ, 0 γ M } i t λ This fuctio exists for ay real t such that it λ 0 dividig by zero is ot allowed. I the previous calculatio, that the complex itegrad is differetiable has bee use to calculate the lie complex itegral by usig a atiderivative ad the equivalet to the Barrow's rule. Now, the defiitios ad rules of the aalysis for complex fuctios of a real variable must be cosidered to do [ φ 0 λ i E i i λ i t ] λ λ λ t0 [ φ 0 λ i λ i t i E i i λ i t ] [ λi i λ it3 t0 ] t 0 λ λ 3 λ Mea ad variace μe λ σ Var E E 87 λ λ λ Solved Exercises ad Problems of Statistical Iferece

6 The ormal distributio By applyig the defiitios E x e π σ x μ σ t t σ dx tμ e dt π σ t σ σ t e dt μ e dt 0μ μ π σ π σ Where the chage tx μ xt μ dxdt has bee applied. I the secod lie, the first itegral is zero because the itegrad is ad odd fuctio ad rage of itegratio is symmetric, while the secod itegral is oe because fx is a desity fuctio. E x e π σ x μ σ t t t σ σ dx t μ e dt t μ μ t e dt π σ π σ t t σ σ σ t e dt μ e dt μ t e dt π σ π σ π σ t σ t e dt μ μ 0 σ π σ μ σ μ π σ π σ where the first itegral has bee calculated as follows. t e t σ [ dt t t e t σ dt t σ e t σ ] σ e t σ dt0 0 σ t σ dt e σ σ e u duσ π σ Firstly, we have applied itegratio by parts ut u ' t σ v ' t e v t e t σ dt σ e t σ Agai, the fuctio ex chages faster tha xk, for ay k. The, we have applied the chage t u tu σ dtdu σ σ x ad the well-kow result e dx π see the appedix of Mathematics. O the other had, these itegrals coverge for ay real t. By usig the momet geeratig fuctio M tee e e π σ sice t tx e xt e x μ σ e tx dx e π σ [ σ t xx μ μ x ] σ dx e x μ σ e 88 dx e {x [σ t μ] [σ t μ]μ } σ μ [ σ t μ]μ[σ t μ] σ x μ σ dxe dxe t μ σ t { x x [σ t μ]μ } σ {μ [σ t μ]} σ dx e x [σ t μ] σ [ σ t] [μ σ l t ] σ e u σ due Solved Exercises ad Problems of Statistical Iferece dx π σ

e t [ μσ lt ] π σ where we have applied the chage x [σ tμ ] u σ xu σ [σ tμ ] dxdu σ The itegrad suggested completig the square i the expoet. This way is idicated Probability ad Radom Processes, by Grimmett ad Stirzaker Oxford Uiversity Press for the stadard ormal distributio. We have used this idea for the geeral ormal distributio. This fuctio exists for ay real t. Now, because of the mathematical real aalysis, [ E M 0 e [ t μσ t E M 0 e ] [ ] t μ σ t μ σ t e μ σ t t 0μ t 0 t μσ t [μσ t σ ] ]t 0μ σ By usig the characteristic fuctio φ te eit eitx e π σ x μ σ itx dx e π σ x μ σ dx e it μ σ it This fuctio exists for ay real t. I this case, usig the previous calculatios with it i place of t leads to the correct result, but the whole way is ot: i complex aalysis we ca also make a square appear i the expoet, as well as move coefficiets outside of the itegral these operatios are ot trivial geeralizatios of the aalogous i real aalysis, ad it is ecessary to take ito accout the defiitios, properties ad results of complex aalysis, but itegrals must be solved i the proper way. For this sectio, I have cosulted Variable compleja y aplicacioes, Churchill, R.V., y J.W. Brow, McGraw-Hill, 5ª ed., ad Teoría de las fucioes aalíticas, Markushevich, A., Mir, ª ed, ª reimp. Whe the followig limit exists, the itegral ca be solved as follows itx e x μ σ M dxlim M M e itx x μ σ dx Now, by completig a square i the expoet, as for previous geeratig fuctios, M itx M e x μ σ dx e i t [ μσ i t ] M M e x μ i σ t σ dx Because of the rules of complex aalysis, these calculatios are similar but based o ew defiitios ad properties to those of previous sectios. What is much differet is the way of solvig the itegral. Now we caot fid a atiderivative of the itegrad as we did for the expoetial distributio ad therefore we must thik of calculatig the itegral by cosiderig a cotour cotaiig the poits {x μ i σ t, M x M }. The itegral of a complex fuctio is ull for ay close cotour withi the domai i which the fuctio is differetiable. We cosider the cotour: C γc I γ C II γ C III γ C IV γ C I γ{zγ μ i t σ, M γ M } C II γ{zm μ iγ t σ, 0 γ t σ } C III γ {z γ μ, M γ M } C I γ{z M μ i γ, 0 γ t σ } 89 Solved Exercises ad Problems of Statistical Iferece

The, 0 C f zdz C f z dz C f z dz C f z dz C f z dz I II III IV z σ so for f z e M M e x μ i σ t σ dx C e z σ II tσ tσ dz C e [ M μ iγ t σ ] σ [M μ γ t σ i M μγ t σ ] σ 0 e z σ dz C e III 0 e M d γ M e z σ dz IV [ γ μ] σ tσ d γ 0 e M d γ M e γ μ σ [ M μ i γ] σ tσ d γ 0 e dγ [M μ γ i M μ γ] σ dγ We are iterested i the limit whe M icreases. For the first itegral, tσ e 0 M μ σ e γ t σ [i M μγ t σ ] σ σ e tσ d γ 0 e M μ σ e M μ σ tσ e 0 e γ t σ [i M μγ t σ ] σ σ d γ e γ t σ σ d γ M 0 Sice e cos ci si c, c ℝ ad the last itegral is fiite the itegrad is a cotiuous fuctio ad the iterval of itegratio is compact ad does ot deped o M. For the secod itegral, ic M M e γ μ σ d γ M e where the chage γ μ u σ γ μ M μ σ M μ σ d γ M e u σ du M σ e u du π σ σ γu σ μ d γdu σ M μ ad σ γ μ σ M μ σ has bee applied. Fially, for the third itegral, tσ 0 e M μ γ i Mμγ σ σ σ e e dγ e M μ σ tσ 0 e γ σ d γ M 0 Agai, the last itegral is fiite ad does ot deped o M. I short, itx φ t e π σ x μ σ M itx dx lim e M M π σ i t [ μσ i t ] M σ e lim M M e π σ x μ i σ t x μ σ dx i t [ μσ dx e π σ it ] π σ e it [μ σ i t ] This fuctio exists for ay real t. The reader ca otice that the correct way is slightly loger. Now, E φ 0 i [ e [ i t [μ σ i t ] φ 0 e E i ] i t [ μσ i t ] iμ iσ t e iμi σ t t 0 iμ t 0 μ i i i i t [ μ σ it ] [ [ i μ iσ t ii σ ] ]t0 i ] i μ i σ μ σ i Mea ad variace μe μ 90 Solved Exercises ad Problems of Statistical Iferece

σ Var E E σ μ μ σ Coclusio: To calculate the momets of a probability distributio, differet methods ca be cosidered, some of them quite more difficult tha others. The characteristic fuctio is a complex fuctio of a real variable, which requires theoretical justificatios of complex aalysis we must be aware of. My otes: [Ap] Mathematics Remark m: The expoetial fuctio ex chages faster tha ay moomial xk of ay k. Remark m: I complex aalysis, there are frequetly defiios ad properties aalogous to those of real aalysis. Nevertheless, oe must take care before applyig them. Remark 3m: Theoretically, quatities like proportios sometimes expressed i per cet, rates, statistics, etc., are dimesioless. To iterpret a umerical quatity, it is ecessary to kow the framework i which it is beig used. For example, 0.98% ad 0.98% are differet. The secod must be iterpreted as 0.98% 0.99%. Thus, to track how they are trasformed the use of a symbol may be useful. Remark m: I workig with expressios equatios, iequatios, sums, limits, itegrals, etc, special attetio must be paid whe 0 or appears. For example, eve if two limits series, itegrals, etc do ot exist, their summatio differece, quotiet, product, etc may exist. lim 3 ad lim, but lim 3/ 0 or dx does ot exist while x dx does x x O the other had, may paradoxes e.g. Zeo's oes are based o ay wrog step i red color: 0 0 0 0 3 0 0 3 3 ad 3 3 3 Readers of advaced sectios may wat to check some theoretical details related to the followig items the very basic theory is ot itemized. Some Remiders Real Aalysis For real fuctios of oe or several real variables. x y j0 x j y j or, equivaletly, x y j0 x j y j j j Biomial Theorem. Limits: ifiitesimal ad ifiite quatities. Itegratio: methods itegratio by substitutio, itegratio by parts, etc., Fubii's theorem, lie itegral. Series: covergece, criteria of covergece, radius of covergece, differetiability ad itegrability, Taylor series, represetatio of the expoetial fuctio, power series. Cocretely, whe the criterio of the quotiet is applied to study the covergece, the radius of covergece is defied as: lim m am c x m c lim m m m x lim m m < am c m c m x x < lim m Similarly for the criterio of the square root. 9 Solved Exercises ad Problems of Statistical Iferece c m r c m

Geometric Series. For 0<b<, a j0 a b j b < See, for example, http://e.wikipedia.org/wiki/geometric_sequece Arithmetico-Geometric Series. See, for example, http://e.wikipedia.org/wiki/arithmetico-geometric_sequece Ordiary differetial equatios. Complex Aalysis For complex fuctios of oe real variable. Limits: defiitios ad basic properties. Differetiatio: defiitios ad basic properties. Itegratio: defiitios ad basic properties, atiderivatives ad Barrow's rule. For complex fuctios of oe complex variable. Elemetary fuctios: the expoetial complex fuctio Limits: defiitios ad basic properties, ifiitesimal ad ifiite quatities. Differetiatio: defiitios ad basic properties, holomorphic or aalytic fuctios. Itegratio: defiitios ad basic properties, atiderivatives ad Barrow's rule, basic theorems of a aalytic fuctio o a close cotour, itegratio by parts. Series: covergece ad absolute covergece, criteria of covergece, radius of covergece, differetiability ad itegrability, Taylor series, represetatio of the expoetial fuctio. Limits Frequetly, we eed to deal with limits of sequeces ad fuctios. For sequeces, ay variable or idex say ad the quatity of iterest say Q ca take values i a coutable set of discrete positive values, eve for multidimesioal situatios: the coutable product of coutable sets is a coutable set. Calculatios are easier whe there is ay mootoy, sice the small steps determie the whole way, or symmetry. For example, the summatio ad the product icrease whe ay term icreases, or both, while the differece ad the quotiet may icrease or decrease depedig o the term that icreases i a uit, sice the two terms are ot affectig the total expressio i the same directio. Techiques I calculatig limits, firstly we try to metally substitute the value of the variable i the sequece or fuctio. This is frequetly eough to solve it, although we ca do some formal calculatios specially if we are ot totally sure about the value. Whe the previous substitutio leads to oe of the followig cases, 0,, 0, 0, 0, ad 00 we talk about idetermiate forms we have ot writte possible variatios of the sigs or positios, e.g. 0,, or 0/0. The value depeds o the particular case, sice oe term ca be faster tha the other i 9 Solved Exercises ad Problems of Statistical Iferece

tedig to a value. There are differet techiques to cope with the limits ad to trasform some idetermiate forms i others. Notice that limits like 0 0 are ot idetermiate forms, sice a b a b. Limits i Statistics Sice the sample sizes of populatios are positive iteger umbers, i Statistics we have to deal with limits of sequeces frequetly. Oe-Variable Limits: The variable takes values i ℕ. For this variable, there is a uique atural way for to ted to ifiite by icreasig it oe uit at a time. There is a total order i the set ℕ, which is coutable. I Statistics, we are usually iterested oly i ay possible odecreasig sequeces of values for, which ca be see as a possible sequece of schemes where more ad more data are added. Two-Variable Limits: A pair of values, ca be see as a poit i ℕ x ℕ. There are ifiite ways for ad to ted to ifiite by icreasig ay of them, or both, oe uit at a time. There is ot a total order i the product space ℕ x ℕ, though it is still a coutable set. Agai, i Statistics we are usually iterested oly i ay possible odecreasig sequece of pairs of values,, which ca be see as a sequece of schemes where more ad more data are added. I this documet, we have to work with easy limits or idetermiate forms like / ivolvig polyomials. For the latter type of limit, we look at the terms with the highest expoets ad we multiply ad divide the quotiet by the proper moomial so that to idetify the egligible terms, which formally ca be see as the use of ifiites. We will also metio other techiques. Techique Based o Paths Oe-Variable Limits: Ay possible sequece of values for the sample size, say k, ca be see as a subsequece of the most complete set ℕ of possible values k k. We are specially iterested i odecreasig sequeces of values k k. The evaluatio of ay oe-dimesioal quatity at a subsequece, Qk, ca be see as a subsequece of Qk. If this sequece coverges, ay subsequece like that must coverge. The opposite is ot true, sice we ca fid ocoverget Qk with a coverget subsequece Qk. The followig result ca be foud i literature. Theorem For a real fuctio f of a real variable x, defied o ℝℝ, if a is a accumulatio poit the followig coditios are equivalet: i limx a fx L ii For ay sequece i the domai such that limk xk a, it holds that limk fxk L A sequece is a particular case of real fuctio of a real variable, ad is a accumulatio poit i ℝ. Two-Variable Limits: Ay possible sequece of values k,k ca be see as a path sk i the most complete set ℕ x ℕ of possible values k,k. Agai, we are specially iterested i odecreasig sequeces of values k k ad k k. 93 Solved Exercises ad Problems of Statistical Iferece

The evaluatio of ay oe-dimesioal quatity at a path, Qk,k, ca be see as a subset of Qk,k. The covergece of Qk,k may deped o the path sk. Nevertheless, for those cases where the subset Qk,k ca be ordered to form a oe-idex coverget sequece, say Qk, ay subsequece Qk,k must coverge. The opposite is ot true, sice we ca fid ocoverget Qk with a coverget subsequece Qk,k. Notice that the set ℕ x ℕ is coutable ad hece ca be liearized i the sese of beig described by usig oe idex oly, ad the the theorem above ca be applied. The idea cosists i provig the existece of the limit sectio i i the theorem by usig the mootoy ad the properties of sequeces, ad calculatig it by usig a particularly appropriate sequece xk sectio ii i the theorem. I wrote this way to prove that the limit of Q/ does ot deped o the path cosidered, ad the uique limit ca be foud by cosiderig a specially appropriate path. It is possible to thik about a uderlyig two-dimesioal iductio priciple: whe a statemet that depeds o the positio, is still true whe ay of this variables icreases i a uit, the the statemet is true restricted to ay oe-step path k,k. The previous odecreasig sk are the oly paths of iterest i Statistics. Mathematically, o the cotrary, for ay path sk such that the sample sizes ted to ifiite, the previous descriptio i terms of steps could be used to prove that the leftward or dowward steps must always be compesated ad outumbered by far. Fially, ay sizes ca be used for the steps of a sequece k,k, sice it is also possible to complete them so that to obtai a path mk,mk i terms of oe-sized steps. Thus, if a limit is differet for two of those sequeces, the limits are also differet for these paths. Exercise m * Prove that a e x dx π e a x dx πa, b a ℝ c 0 e x dx π Discussio: The itegrad is a cotiuous fuctio. We remember that e x has o atiderivative but it is still possible to calculate defiite itegrals for same domais. As regards the limits of itegratio, the domai is ifiite ad we must deal with improper itegrals. a Fiiteess: Firstly, we prove that the itegral is fiite ot to be workig with the equality of two ifiite quatities somethig really dagerous. e x dx { x < } e x dx { x } e x dx dx e x dx [ e x ] x e < sice If 0 x < the 0 x < ad e 0 e x <e ad hece e 0 e x >e For a eve fuctio, the itegral betwee -k ad k is twice the itegral betwee 0 ad k. If x the x x ad e x e x ad hece e x e x For ay two quatities, if a b ad a b the a a b b 9 Solved Exercises ad Problems of Statistical Iferece

Temporary calculatios i a twodimesioal space: The Fubii's theorem of itegratio for improper itegrals ca be applied to do I I I e x dx e y dy e x y dx dy 0 0 π 0 e ρ ρd θ d ρ 0 π 0 e [ρ cos θ ρ si θ ] ρ d θ d ρ e ρd θ d ρ π 0 e { ρ [ ] ρ where the Jacobia matrix of the chage of variables e ρ ρ d ρ π 0 π [ e ρ ] π 0 xρcos θ is xρsi θ x J ρ y ρ π 0 x θ cosθ ρsi θ ρ cos θ ρsiθ ρ y siθ ρ cosθ θ Come back to a oedimesioal space: Fially, e x dxi I π b Now, to prove that e a x dx πa we apply the chage a xu x which leads to u a dx du a e a x dx e a x dx a e u du a π πa c O the other had, sice f xe x e x f x is a eve fuctio, 0 x e dx x e dx π x p A alterative proof uses the gamma fuctio Γ p 0 e x dx ad the fact that Γ π. by applyig the chage of variable x t, for t 0, which implies that x t ad hece dx 0 x e dx Now, t dt, t / π e t dx Γ. 0 Coclusio: To be allowed to apply the versio of the Fubii's theorem for improper itegrals, the fiiteess of the first itegral has firstly bee proved. The itegral of sectio a is used to calculate the others, respectively by applyig a chage of variables ad by cosiderig the eve character of the itegrad. About the proof based o the multiple itegratio: Proof by Siméo Deis Poisso 78 80, accordig to El omipresete úmero π, Zhúkov, A.V., URSS. I had foud this proof i may books, icludig the previous referece for the itegral i sectio b with a/. I have writte the boud of the itegral. About the proof based o the gamma fuctio: I have foud this proof i Problemas de oposicioes: Matemáticas Vol. 6, De Diego y otros, Editorial Deimos. I this textbook, the itegral i sectio c is solved by usig the two approaches. My otes: 95 Solved Exercises ad Problems of Statistical Iferece

Exercise m Study the followig limits of sequeces of oe variable lim ak k a k k a a0, where aj are costats, where c is a costat c a b 3 lim, where a, b, c ad d are costats c d lim k a b, where a ad c are costats ad b ad d are polyomials whose degrees are lim k c d smaller tha k ad k, respectively a b 5 lim, where a, b ad c are costats c 3 Discussio: We have to study several limits. Firstly, we try to substitute the value to which the variable teds i the expressio of the quatity i the limit. If we are lucky, the value is foud ad the formal calculatios are doe later; if ot, techiques to solve the idetermiate forms must be applied. k lim ak a k k a a0, where aj are costats Way 0: Ituitively, the term with the largest expoet leads the growth whe teds to ifiite. The, { lim ak k a k k a a0 if ak <0 if a k > 0 Necessity: lim a k k ak k a a 0 the If ot, that is, if M >0 such that < M <, the ak k a k k a a0 a k k ak k a a0 < a k M k a k M k a M a 0 < ad the limit could ot be ifiite. lim, where c is a costat c Way 0: Ituitively, the deomiator teds to ifiite while the umerator does ot. For huge, the value of c is egligible. The, the limit is zero. Way : Formally, we divide the umerator ad the deomiator all their terms by. lim lim lim 0 c c c 96 Solved Exercises ad Problems of Statistical Iferece

Way : By usig ifiites of the same order, we ca substitute c by : lim lim 0 c 0 c Necessity: lim the If ot, that is, if M >0 such that < M <, the 3 lim > > 0 ad the limit could ot be zero. c M c a b, where a, b, c ad d are costats c d Way 0: This limit icludes the previous. The quotiet is a idetermiate form. Ituitively, the umerator icreases like a ad the deomiator like c. The terms b ad d are egligible for huge. The, the limit of the quotiet teds to a/c. Way : Formally, we divide the umerator ad the deomiator all their terms by. lim a b a b lim lim cd c d b a d c c a Way : By usig ifiites, lim a b a a a lim lim cd c c c Necessity: lim a b a c d c the If ot, that is, if M >0 such that < M <, the ab a acbc ac ad bc ad >0 c c M d c d c c cd ad the limit could ot be a/c... uless the origial quotiet was always equal to this value. Notice that whe the previous umerator is cero, adbc a b λ c d c {aλ bλ d ab λ c d a λ cd cd c ab a 0 cd c that is, i this case the fuctio is really a costat. I the iitial statemet, the coditio ac db 0 could have bee added for the polyomials ab ad c d to be idepedet. a k b, where a ad c are costats ad b ad d are polyomials whose degrees are k c d smaller tha k ad k, respectively lim Way 0: This limit icludes the two previous. The quotiet is a idetermiate form. Ituitively, the umerator icreases like ak ad the deomiator like c k while b ad d are egligible. Thus, 97 Solved Exercises ad Problems of Statistical Iferece

{ 0 if k < k a if k k k c a b lim k a c d if k > k, <0 c a if k > k, >0 c Way : Formally, we divide the umerator ad the deomiator all their terms by the power of with the highest degree amog all the terms i the quotiet if there were products, we should imagie how the moomials are. For example, for the case k <k [ k lim k k ] a b a b lim k lim k c d c k d b k 0 d c k a k k Similarly for the other cases. Way : By usig ifiites, sice b ad d are egligible for huge, a k b lim k lim c d 5 lim { 0 if k <k a if k k k c a a k k lim a c if k >k, < 0 c k c a if k >k, > 0 c a b, where a, b ad c are costats c 3 Way 0: The quotiet is a idetermiate form. Ituitively, the umerator decreases like a/ the slowest ad the deomiator like c/3, so the deomiator is smaller ad smaller with respect to the umerator, ad, as a cosequece, the limit is or depedig o whether a/c is egative or positive, respectively. Way : Formally, it is always possible to multiply or divide the umerator ad the deomiator all their moomials, if they are summatio, or ay elemet, if they are products by the power of with the appropriate expoet. The we ca do lim a b a b a 3 if <0 a b c lim 3 lim c c c a if > 0 3 3 c 98 { Solved Exercises ad Problems of Statistical Iferece

Way : By usig ifiitesimals, a b lim c 3 lim a a if < 0 3 a a c lim lim c c c a if >0 c 3 { Coclusio: We have studied the limits proposed. Some of them were almost trivial, while others ivolved idetermiate forms like 0/0 or /. All the cases were quotiets of polyomials, so the limits of the former form have bee trasformed ito limits of the latter form. To solve these cases, the techique of multiplyig ad dividig by the same quatity has suffices there are other techiques, e.g. L'Hôpital rule. Additioal examples lim 0 or lim lim 0 lim 0 lim lim [ ] or lim lim lim lim lim lim or lim lim or lim lim lim lim 3 3 or lim lim 3 My otes: Exercise 3m * Study the followig limits of sequeces of two variables lim lim 3 lim lim lim ad ad lim ad lim a b c 99 ad lim a where a, b ad c are costats b c Solved Exercises ad Problems of Statistical Iferece

5 lim 6 lim ad lim ad lim 7 lim 8 lim d lim ad lim 9 lim 0 lim a b c ad lim a b c ad lim Discussio: We have to study several limits of two-variable sequeces. Firstly, we try to substitute the value to which the variable teds i the expressio of the quatity i the limit. If we are lucky, the value is foud ad the formal calculatios are doe later; if ot, techiques to solve the idetermiate forms must be applied. These limits may be quite more difficult tha those for oe variable, sice we eed to prove that the value does ot deped o the particular way for the sample sizes to ted to ifiite if the limit exists or is ifiite or fid two ways such that differet values are obtaied the limits does ot exist. lim ad lim Way 0: Ituitively, the first limit is ifiite while the secod does ot exist, sice it depeds o which variable icreases faster. Way : For the first limit to be ifiite, it is ecessary ad sufficiet oe variable tedig to ifiite, say. lim > lim For the ecessity, if < M < ad < M < the lim < M < ad the limit could ot be zero. To see that lim it is eougth to see that differet values are obtaied for differet paths: s k k, k ad s k k, k, 00 Solved Exercises ad Problems of Statistical Iferece

lim s k lim k k k lim ad lim s k lim k k k 0 ad lim Way 0: Ituitively, the first limit is ifiite while the secod does ot exist, sice it depeds o which variable icreases faster. Way : For the first limit to be ifiite, it is ecessary ad sufficiet oe variable tedig to ifiite, say. lim > lim For the ecessity, if < M < ad < M < the lim < M < ad the limit could ot be zero. To see that lim it is eougth to see that differet values are obtaied for differet paths: s k k, k ad s k k, k, lim s k 3 lim k lim k k ad ad lim lim s k k lim k k Way 0: Eve if the expressio ca be simplified, we use this case to show that the product of icreasig terms icreases faster tha ay of its terms, ad the ew rate is the product of the two rates the expoets are added. The quotiet i a idetermiate form. The first limit seems ifiite ad the secod zero. Way : Formally, we simplify the quotiet lim lim ad lim lim 0 A product of icreasig terms that are bigger tha oe icreases faster tha ay of its terms. The secod limit ca also be see as the iverse of the first. The sufficiecy ad the ecessity i these limits is determied by the behaviour of : the first limit is ifiite ad the secod is zero if ad oly if teds to ifiite. lim a b c ad lim a where a, b ad c are costats b c Way 0: The quotiet i a idetermiate form. Ituitively, the product of icreasig terms icreases faster tha ay of its terms, ad the ew rate is the product of the two rates the expoets are added. The costats are egligible whe they are added to or substracted from a power. The first limit seems ifiite ad the secod zero. Way : Formally, we multiply the umerator ad the deomiator all their moomials, if they are summatio, or ay elemet, if they are products by the product of the powers of ad with the highest expoets 0 Solved Exercises ad Problems of Statistical Iferece

lim [ a b lim c ] a b lim c a b c Way : By usig ifiites, lim a b lim lim c The secod limit ca also be see as the iverse of the first, by chagig the letter of the costats, so we do ot repeat the calculatios. The sufficiecy ad the ecessity i these limits is determied by : the first limit is ifiite ad the secod is zero if ad oly if teds to ifiite. 5 lim ad lim Way 0: Eve if the expressio ca be simplified, we use this case to show that the product of decreasig terms decreases faster tha ay of its terms, ad the ew rate is the product of the two the expoets are added. The quotiet i a idetermiate form. The first limit seems zero ad the secod is ifiite. Way : Formally, we simplify the quotiet lim lim 0 lim lim A product of decreasig terms that are smaller tha oe decreases faster tha ay of its terms. The secod limit ca also be see as the iverse of the first. The sufficiecy ad the ecessity i these limits is determied by the behaviour of : the first limit is zero ad the secod is ifiite if ad oly if teds to ifiite. 6 lim ad lim The quotiet is a idetermiate form. Sice we ca write lim? lim? ad we have see that the limits of the ew quotiets do ot exist, it seems that oe of the limits exists. Formally, we could cosider the same paths as we cosidered there. The secod limit ca also be see as the iverse of the first. lim 7 lim a b c ad 0 ad lim lim a b c Solved Exercises ad Problems of Statistical Iferece

Way 0: The costats are egligible ad these limits are like the previous, amely: the first limit seems zero ad the secod is ifiite. Way : Formally, we multiply the umerator ad the deomiator all their moomials, if they are summatio, or ay elemet, if they are products by the product of the powers of ad with the highest expoets lim a b c lim 0 a b c Way : By usig ifiites, lim a b lim lim c 0 The secod limit ca also be see as the iverse of the first, by chagig the letter of the costats, so we do ot repeat the calculatios. As regards the sufficiecy ad the ecessity i these limits, it is determied by the behaviour of : the first limit is zero ad the secod is ifiite if ad oly if teds to ifiite. 8 lim d lim Way 0: The quotiet i a idetermiate form. Ituitively, ay sum of decreasig terms decreases like the slowest while the other becomes egligible. Thus, the first limit would be oe if the fastest is, ifiite if the faster is ; ad, if both are equal, the limits are two ad oe over two, respectively. I short, it seems this limit does ot exist. Way : Formally, we ca do lim lim lim lim? lim? The secod limit ca also be see as the iverse of the first. 9 lim ad lim The limit appears i the variace of the estimators of σ/σ. We solve it i two simple ways, although others ways are cosidered as a itellectual exercise. 03 Solved Exercises ad Problems of Statistical Iferece

Way 0: Ituitively, the product chages faster tha the summatio. The, the first limit seems zero ad the secod ifiite. Way : Formally, we ca do lim lim lim lim lim lim 0 lim It is sufficiet ad ecessary that both variables ted to ifiite. For the ecessity, < M < the > >0 M ad the limit could ot be zero. Way : Firstly, let us suppose, without loss of geerality, that. The 0 lim [ ] lim lim 0 y has bee dropped from the umerator ad the deomiator, it is ot that a iterated limit is beig calculated. Noetheless, this solutio does ot cosider those paths for the sample sizes that cross the bisector lie, that is, whe oe of the sizes is uiformly behid the other. To complete the proof it is eough to use agai the symmetry of the expressio with respect to the two variables it is the same if we switch them: for ay sequece of values for, crossig the bisector lie, a equivalet sequece i the sese that the sequece takes the same values either above or behid the bisector lie ca be cosidered by lookig at the bisector lie as a mirror or a barrier. Way 3: Polar coordiates ca also be used to study that limit. For ay sequece skk,k, { k ρ k cos [α k ], 0<ρ k <, 0< αk < π kρ k si[α k ] { ρk k k k α k arctg k A mathematical characterizatio of a sequece sk correspodig to sample sizes that ted to ifiite ca be ρk i such a way that eve whe cos [αk ] 0 or si[α k ] 0 the products k ρk cos [α k ] ad k ρk si [α k] still ted to ifiite. The, the limit is calculated as follows lim k ρk [ cos[α k ]si [α k ] ] lim k 0 ρk cos[ α k ]si[ α k ] ρk cos [α k ]si[α k ] The oly cases that could cause troubles would be those for which either the cosie or the sie teds to zero the other teds to oe. Nevertheless, the characterizatio above shows that the deomiator would still ted to ifiite. Fially, as regards the ecessity, let us suppose, without loss of geerality, that M <. The, sice ρk it must be cos [αk ] 0 i such a way that k ρk cos [α k ] M. As a cosequece, lim k ρk { cos [α k ]si [αk] } cos [αk] si[α k ] 0 lim k >0 M si[ α k ] M M ρ k cos [α k ]si [α k ] 0 Solved Exercises ad Problems of Statistical Iferece

Way : Ituitively: a The mea square error ad the sequece i the limit should mootoically decrease with the sample sizes. b We are workig with oegative quatities there is a lower boud. c It is a well-kow result that a oicreasig, bouded sequece always coverges. d The limit of a sequece, whe it exists, is uique. As a cosequece, it ca be calculated by usig ay subsequece cocretely, a appropriate simple oe. The opposite is ot true: that oe subsequece coverges does ot imply that the whole sequece coverges. First, whe icreases i a uit the sequece decreases:??? > > es > Sice the expressio of the sequece is symmetric, the same iequality is true whe icreases i a uit. Fially, the case whe both sizes icrease i a uit ca always be decomposed i two of the previous steps, while the quatity Q, depeds oly o the positio, ot o the way to arrive at it; thus, the sequece decreases i this case too. Secod, Q ca take values i a discrete set that ca sequetially be costructed ad ordered to form a sequece that is strictly decreasig ad bouded, say Qk. The set ℕ x ℕ is coutable. The symmetry implies that the icrease of Q ca take oly two values ot three whe ay sample size or both icrease i a uit. I sort, Qk coverges, though we eed ot build it. Third, ay path sk such that the sample sizes are odecreasig ad ted to ifiite ca be writte i terms of oe-uit rightward ad upward steps, with a ifiite amout of ay type. For each path sk, the quatity Q s k k k k k ca be see as a subsequece of Qk. Fially, the limit of Q is uique ad the case idicates that it is zero: k k lim k lim k 0 k k k For the ecessity for both sample sizes to ted to ifiite, let us suppose, without loss of geerality, that M <. There would be a subsequece that caot ted to zero: lim k k k k k lim k >0 k k M k M whatever the behaviour of k. The previous odecreasig sk are the oly paths of iterest i Statistics. 0 lim ad lim Way 0: Ituitively, the limit of the differece does ot exist, sice it takes differet values that deped o the path; but the differece or the summatio, i the previous sectio is so smaller tha the product, that the first limit seems zero while the secod seems ifiite. Formally, we ca do calculatios as for the previous limit, for example lim lim lim lim lim 0 00 or, alteratively, use the boud: 05 Solved Exercises ad Problems of Statistical Iferece

lim lim lim 0 Coclusio: We have studied the limits proposed. Some of them were almost trivial, while others ivolved idetermiate forms like 0/0 or /. All the cases were quotiets of polyomials, so the limits of the former form have bee trasformed ito limits of the latter form. To solve these cases, the techique of multiplyig ad dividig by the same quatity has suffices there are other techiques, e.g. L'Hôpital rule. Other techiques have bee applied too. Additioal Examples: Several limits have bee solved i the exercises look for limit i the fial idex. My otes: Exercise m * For two positive itegers ad, fid the discrete frotier ad the two regios determied by the equality Discussio: Both sides of the expressio are symmetric with respect to the variables, meaig that they are the same if the two variables are switched. This implies that the frotier we are lookig for is symmetric with respect to the bisector lie. The square suggests a parabolic curve, while suggests a sort of trasformatio of a coic curve. Ituitively, i the regio aroud the bisector lie, the differece of the variables is small ad therefore the right-had side of the origial equality is smaller tha the left-had side; obviously, the other regio is at the other side of the discrete frotier. Purely computatioal approach: I a previous exercise we wrote some force-based lies for the computer to plot the poits i the frotier. Here we use the same code to plot the ier regio see the figures below N 00 vectornx vectormode"umeric", legth0 vectorny vectormode"umeric", legth0 for x i :N { for y i :N { if *xy>x-y^ { vectornx cvectornx, x; vectorny cvectorny, y } } } plotvectornx, vectorny, xlim c0,n, ylim c0,n, xlab'x', ylab'y', maipaste'regios', type'p' Algebraical-computatioal approach: Before usig the computer, we ca do some algebraical work 0 ± ± ± 06 Solved Exercises ad Problems of Statistical Iferece

The followig code plots the two braches of the frotier see the figures above N 00 vectornx seq,n vectornypos vectornxsqrt*vectornx vectornyneg vectornx-sqrt*vectornx itegersolutios vectornypos/roudvectornypos yl c0, maxvectornypos[itegersolutios], vectornyneg[itegersolutios] plotvectornx[itegersolutios], vectornypos[itegersolutios], xlim c0,n, ylim yl, xlab'x', ylab'y', maipaste'frotier', type'p' poitsvectornx[itegersolutios], vectornyneg[itegersolutios] Algebraical, aalytical ad geometrical approach: The chage of variables C,, u, v is a liear trasformatio. The ew frotier ca be writte as the parabolic curve v u. The computer allows plotti this frotier i the U-V plae. N 50 vectoru seq-50, 50 vectorv 0.5*vectorU^ plotvectoru, vectorv, xlim c-n-,n, ylim c0,maxvectorv, xlab'u', ylab'v', maipaste'frotier', type'p' How should the chage of variables be iterpreted? If we write uv the previous matrix remids us a rotatio i the plai although movemets have orthoormal matrixes ad the previous is oly orthogoal. Let us have a look to how a triagle a rigid polygo is trasformed, 07 Solved Exercises ad Problems of Statistical Iferece

P, C P, 3 P, C P 0, P3, C P3,3 To cofirm that C is a rotatio plus a dilatatio homothetic trasformatio, or vice versa, we cosider the distaces betwee poits, the liearity, ad a rotatio of the axes. First, if ~ ~ A a a, a a Aa, a Bb, b Bb b, b b the d ~ A,~ B [b b a a ] [b b a a ] [b a b a ] [b a b a ] u, v b a b a b a b a d x, y A, B This meas that the previous chage of variable is ot a isometry; therefore it caot be cosidered a movemet i the plai, techically. Noetheless, the previous lies show that the liear trasformatio C,, u, v, respects the distaces, so it is a isometry whose matrix is orthoormal that is, it is a movemet. Now, the frotier is v u C ca be writte as u v [ ] which is the expressio of a rotatio i the plai see the literature o Liear Algebra. Secod, the liearity implies that both C ad C trasform lies ito lies. The expressio AB0 A λ ABa, a λ b a, b a λ b λ a, λ b λa determies the lie cotaiig A ad B if λ ℝ ad the segmet from A to B if λ [0,]. It is trasformed as follows C λ b λ a, λ b λ a λ b λ a λ b λa, λ b λa λ b λ a λ b b λa a, λb b λa aλ b b, b b λa a, a a λ C b, b λc a, a similarly for C. This expressio determies the lie cotaiig CA ad CB if λ ℝ ad the segmet from CA to CB if λ [0,]. Third, as regards the rotatio of axes, the followig figure ad formulas are geeral e cos α ~ e si α ~ e e si α ~ e cos α ~ e { Rotatio siistrorsum e cos α ~ e si α ~ e e si α ~ e cos α ~ e { Rotatio dextrorsum 08 Solved Exercises ad Problems of Statistical Iferece

Whe the axes are rotated i oe directio, it ca be thought that the poits are rotated i the opposite. Now, C ca be writte as a 5º dextrorsum rotatio of the axes e cos π ~ e si π ~ e e si π ~ e cos π ~ e { π si π ~ cos ~ e e e ~ e e e si π cos π ~ ~ e ~ e Ay poit Px, y is trasformed through x x y u. y x y v The matrix M t M M. The, is orthogoal, which meas that M M ti M t M ad implies that ~ e e ~. e e Coclusio: We have applied differet approaches to study the frotier ad the two regios determied by the give equality. Fortuately, owadays the computer allows us to do this work eve without ay deeper theoretical study chage of variable, trasformatio, et cetera. My otes: 09 Solved Exercises ad Problems of Statistical Iferece

Refereces Remark r: Whe a exercise is based o aother of a book, the referece has bee icluded below the statemet; some statemets may have bee take from official exams. I have writte the etire solutios. The slides metioed i the prologue cotai refereces o theory. For some specific theoretical details, some literature is referred to i proper sectio of this documet. [] The R Project for Statistical Computig, http://www.r-project.org/ [] Wikipedia, http://e.wikipedia.org/ My otes: 0 Solved Exercises ad Problems of Statistical Iferece

Tables of Statistics Basic Measures μ E Ω x i f x i Discrete μ E Ω x f x dx Cotiuous σ Var E [ μ] E μ Basic Estimators i i s S i i i i i i s S s s S S S y y p V i i μ η populatio i i η p η η populatios Parameter μ σ Estimator V μ kow σ s μ ukow η or S η Parameter μ μ σ/σ Estimator V V μ, μ kow σ/σ μ, μ ukow η η Solved Exercises ad Problems of Statistical Iferece s s or η η S S

Basic Statistics ormal populatio, ay Parameter μ Statistic T ; μ σ kow μ σ μ N 0, N μ,σ T ; μ μ σ ukow σ T ; σ μ kow σ i i N μ, σ S t V χ σ s S χ σ σ T ; σ μ ukow idepedet ormal populatios, ay ad Parameters μ μ Statistic T, ; μ,μ σ, σ kow μ μ σ σ N μ μ, μ μ T, ; μ,μ σ, σ ukow σ/σ μ μ T, ; σ, σ μ, μ kow S S V σ V σ σ σ where k is the closest iteger to tk V N 0, σ V σ V σ V σ Solved Exercises ad Problems of Statistical Iferece F,

σ/σ T, ; σ, σ μ, μ ukow S σ S σ S σ S σ S σ S σ F, populatio, large Parameter Statistic μ d T ; μ N 0,? where? is substituted by σ, S or s μ η N μ,? d d η η N 0,?? where? is substituted by η or η T ; η d i i N μ,? d η N η,?? idepedet populatios, large ad Parameters μ μ Statistic T, ; μ,μ μ μ d N 0,?? where for each populatio? is substituted by σ, S or s d?? N μ μ, η η T, ; η, η η η η η d N 0,???? where for each populatio? is substituted by η or η Remark T: For ormal populatios, the rules that gover the additio ad subtractio imply that: N μx, σx, x σ N μ y, y, y ad hece N μ x μ y, σ x σ y. x y The tables iclude results combiig the rules with a stadardizatio or studetizatio. We are usually iterested i comparig the mea of the two populatios, for which the differece is cosidered; evertheless, the additio ca also be cosidered with 3 Solved Exercises ad Problems of Statistical Iferece

μ μ σ σ N 0,. O the other had, sice the quality of estimators e.g. measured through the mea square error icrease with the sample size, whe the parameters of two populatios are supposed to be equal the samples should be merged to estimate the parameter joitly especially for small x ad y. The, uder the hypothesis σx σy the pooled sample quasivariace should be used through the statistic: T, ; μ,μ μ μ p p S S t Remark T: For ay populatios with fiite mea ad variace, oe versio of the Cetral Limit Theorem implies that N μx, σ, x d σ N μ y,, y d N μ x μ y, σ σ, x y d ad hece where the rules that gover the covergece i distributio of the additio ad subtractio of sequeces of radom variables see a text o Probability Theory ad the rules that gover the additio ad subtractio of ormally distributed variables are applied. We are usually iterested i comparig the mea of the two populatios, for which the differece is cosidered; evertheless, the additio ca also be cosidered with μ μ?x?y x y η η η η d N 0, ad, for a Beroulli populatio,???? d N 0,. Besides, variaces ca be estimated whe they are ukow. By applyig theorems i sectio. of Approximatio Theorems of Mathematical Statistics, by R.J. Serflig, Joh Wiley & Sos, ad sectios 7. ad 7.3 of Probability ad Radom Processes, by G. Grimmett ad D. Stirzaker, Oxford Uiversity Press, μ μ S S σ σ η η d N 0,N 0, ad η η d η η η η t N 0,. Similarly for two populatios. From the first covergece it is deduced that η η d N 0,. η η O the other had, whe the parameters of two populatios are supposed to be equal the samples should be merged to estimate the parameter joitly especially for medium x ad y. The, uder the hypothesis σx σy the pooled sample quasivariace should be used although i some cases its effect is egligible through the statistic: T, ; μ,μ μ μ S p S p d N 0, For a Beroulli populatio, uder the hypothesis ηx ηy the pooled sample proportio should be used although i some cases the effect is egligible i the deomiator of the statistic: T, ; η, η η η η η d N 0,. η p η p η η p p Remark 3T: I the last tables, the best iformatio available should be used i place of the symbol?. Remark T: The Beroulli populatio is a particular case for which directly estimated without estimatig η, σ μη is used i place of the product ad σ η η,??. so η, Whe the variace σ is Remark 5T: Oce a iterval for the variace is obtaied, Pa < σ < a, sice the positive square root is a strictly icreasig fuctio ad therefore it preserves the order betwee two values a iterval for the stadard deviatio is give by P a < σ < a. Notice that, for a reasoable iitial iterval, 0 < a. Similarly for the quotiet of two variaces σ/σ. Solved Exercises ad Problems of Statistical Iferece

Statistics Based o Λ populatio, ay Parameters θ θ Statistic Λ dimesio Λ r dimesios L ; θ 0 L ; θ L ; θ 0 L ; θ d lλ χ r Asymptotically, Aalysis of Variace ANOVA P idepedet ormal populatios Oe-Factor Fixed-Effects Betwee-Group Measures Sample Quatities P p SSG p p P Withi-Group Measures Total Measures SSW p SS p where p p SS p i p,i P Statistic MSG SSG P MSW SSW P T0 MSG F P, P MSW p SST p i p, i SSW SSG Noparametric Hypothesis Tests Chi-Square Tests Data Null Hypothesis Statistic ad Expected Absolut Frequecy N i e^i d T 0 i χ K sχ K s e^i K,..., Goodess-of-Fit K classes model F0 H0: The sample comes from the model F0 where s parameters are estimated ad e i p i P θ i th class or, if o parameter is estimated, s 0 ad e i p i P θ i th class 5 Solved Exercises ad Problems of Statistical Iferece

Homogeeity {,...,,..., L,..., l N ij e ij T 0 i j e ij L d H0: The samples come from the same model L K χ KL L K χk L where K classes L samples e ij i p ij i p j i N ij e ij T 0, i j e ij L,, Idepedece N j H0: The bivariate sample comes from two idepedet models KL classes variables d K χ KL L K χk L where e ij p ij p i p j N i N j Remark 6T: Although because of differet theoretical reasos, for the practical estimatio of eij the same memoic rule ca be used i both homogeeity ad idepedece tests: for each positio, multiply the absolut frequecies of the row ad the colum ad divide by the total umber of elemets. Kolmogorov-Smirov Tests Data Null Hypothesis T 0 max x F x F 0 x,..., Goodess-of-Fit Statistic H0: The sample comes from the model F0 sample model F0 where F 0 x P x F x Number { i x } T 0, maxt F t F t Homogeeity {,...,,..., where H0: The samples come from the same model Number { i t } F t Number { i t } F t samples Other Tests Null Hypothesis Data Statistic Let R be the umber of rus. T 0 R,..., Rus Test of Radomess dichotomous property Nyes elemets with it No Nyes elemets without it if Nyes < 0, No < 0, ad usig the specific table. Or, for Nyes 0, No 0, H0: The sample is simple ad radom T μ d it has bee selected T 0 0 N 0, by applyig simple σ radom samplig with μ σ ad usig the table of the stadard ormal distributio 6 Solved Exercises ad Problems of Statistical Iferece

T 0 Number { i q 0 >0 } Sigs Test of Positio,..., model F0 positio measure Q e.g. the media H0: The populatio measure Q takes de value q0 if < 0, ad usig the specific table or the table of the Biomial,p, where p depeds o Q e.g. / for the media. Or, for 0, T μ T 0 0 N 0, σ d with σ p p μp ad usig the table of the stadard ormal distributio T 0 { q >0 } Ri i Wilcoxo Siged-Rak Test of Positio,..., model F0 positio measure Q e.g. the media H0: The populatio measure Q takes de value q0 0 if < 0, where Ri are the positios i the icreasig sequece of i q0, ad usig the specific table. Or, for 0, T μ T 0 0 N 0, σ d with μ σ ad usig the table of the stadard ormal distributio Remark 7s: I the statistics, the parameter of iterest is the ukow for cofidece itervals while it is supposed to be kow for hypothesis tests. Remark 8s: Usually the estimators ivolved i the statistic T like s, S... ad the quatiles like a... also deped o the sample size, although the otatio is simplified. Remark 9s: For big sample sizes, whe the Cetral Limit Theorem ca be applied to T or its stadardizatio, quatiles or probabilities that are ot tabulated ca be approximated: p is directly calculated give a, ad for p give a is calculated from the quatile z of the stadard ormal distributio: pp T ap Z a E T Var T z a E T Var T ae T z Var T This is used i the asymptotic approximatios proposed i the tests of the last table. Remark 0s: To cosider the approximatios, sample sizes bigger tha 0 has bee proposed i the last table, although it is possible to fid other cutoff values i literature like 8, 0 or 30; i practice, there is o severe chage at ay value. Remark s: The goodess-of-fit chi-square test ca also be used to test positio measures: by cosiderig two classes with probabilities p, p. Remark s: To test the symmetry of a distributio, the positio tests ca be used. Remark 3s: Although differet types of test ca be applied to evaluate the same hypotheses H0 ad H with the same α type I error, their quality is usually differet, ad β type II error should be take ito accout. A global compariso ca be doe by usig their power fuctios. My otes: 7 Solved Exercises ad Problems of Statistical Iferece

Probability Tables Stadard Normal z pp Z z z e dz for x, ℝ π Take from: Kokoska, S., ad C. Neviso. Statistical Tables ad Formulae. Spriger-Verlag, 989. 8 Solved Exercises ad Problems of Statistical Iferece