* Tuesday 17 January :30-16:30 (2 hours) Recored on ESSE3 General introduction to the course.

Similar documents
Statistical methods and data analysis

Glossary. The ISI glossary of statistical terms provides definitions in a number of different languages:

HANDBOOK OF APPLICABLE MATHEMATICS

Stat 5101 Lecture Notes

Data Analysis. with Excel. An introduction for Physical scientists. LesKirkup university of Technology, Sydney CAMBRIDGE UNIVERSITY PRESS

Kumaun University Nainital

DETAILED CONTENTS PART I INTRODUCTION AND DESCRIPTIVE STATISTICS. 1. Introduction to Statistics

Index. Cambridge University Press Data Analysis for Physical Scientists: Featuring Excel Les Kirkup Index More information

Applied Probability and Stochastic Processes

Subject CS1 Actuarial Statistics 1 Core Principles

THE PRINCIPLES AND PRACTICE OF STATISTICS IN BIOLOGICAL RESEARCH. Robert R. SOKAL and F. James ROHLF. State University of New York at Stony Brook

Introduction to Statistical Analysis

Introduction to the Mathematical and Statistical Foundations of Econometrics Herman J. Bierens Pennsylvania State University

Experimental Design and Data Analysis for Biologists

Contents. Preface to Second Edition Preface to First Edition Abbreviations PART I PRINCIPLES OF STATISTICAL THINKING AND ANALYSIS 1

Psychology 282 Lecture #4 Outline Inferences in SLR

3 Joint Distributions 71

TABLE OF CONTENTS CHAPTER 1 COMBINATORIAL PROBABILITY 1

Chapter 1 Statistical Inference

Exam Applied Statistical Regression. Good Luck!

Prentice Hall Stats: Modeling the World 2004 (Bock) Correlated to: National Advanced Placement (AP) Statistics Course Outline (Grades 9-12)

Multivariate Regression

Contents. Acknowledgments. xix

Exam details. Final Review Session. Things to Review

Statistical Methods for Astronomy

Fundamental Probability and Statistics

Practical Statistics

sphericity, 5-29, 5-32 residuals, 7-1 spread and level, 2-17 t test, 1-13 transformations, 2-15 violations, 1-19

FRANKLIN UNIVERSITY PROFICIENCY EXAM (FUPE) STUDY GUIDE

Hypothesis testing:power, test statistic CMS:

Data modelling Parameter estimation

Confidence Intervals, Testing and ANOVA Summary

Practical Statistics for the Analytical Scientist Table of Contents

IDL Advanced Math & Stats Module

Statistical Methods for Astronomy

Statistical Methods in HYDROLOGY CHARLES T. HAAN. The Iowa State University Press / Ames

Statistical Methods in Particle Physics

Review. DS GA 1002 Statistical and Mathematical Models. Carlos Fernandez-Granda

PART I INTRODUCTION The meaning of probability Basic definitions for frequentist statistics and Bayesian inference Bayesian inference Combinatorics

Can you tell the relationship between students SAT scores and their college grades?

Using R in Undergraduate and Graduate Probability and Mathematical Statistics Courses*

Institute of Actuaries of India

Robustness and Distribution Assumptions

ECON Introductory Econometrics. Lecture 5: OLS with One Regressor: Hypothesis Tests

UNIVERSITY OF MASSACHUSETTS. Department of Mathematics and Statistics. Basic Exam - Applied Statistics. Tuesday, January 17, 2017

GROUPED DATA E.G. FOR SAMPLE OF RAW DATA (E.G. 4, 12, 7, 5, MEAN G x / n STANDARD DEVIATION MEDIAN AND QUARTILES STANDARD DEVIATION

Irr. Statistical Methods in Experimental Physics. 2nd Edition. Frederick James. World Scientific. CERN, Switzerland

Statistical techniques for data analysis in Cosmology

Inferences for Regression

Data Fitting and Uncertainty

Index. Geostatistics for Environmental Scientists, 2nd Edition R. Webster and M. A. Oliver 2007 John Wiley & Sons, Ltd. ISBN:

Course topics (tentative) The role of random effects

p(z)

Regression analysis is a tool for building mathematical and statistical models that characterize relationships between variables Finds a linear

Lecture 7: Hypothesis Testing and ANOVA

Statistics. Lent Term 2015 Prof. Mark Thomson. 2: The Gaussian Limit

Correlation and Simple Linear Regression

Math 562 Homework 1 August 29, 2006 Dr. Ron Sahoo

MS-E2112 Multivariate Statistical Analysis (5cr) Lecture 1: Introduction, Multivariate Location and Scatter

MTH 163, Sections 40 & 41 Precalculus I FALL 2015

Linear Regression Models

First Year Examination Department of Statistics, University of Florida

Correlation and Regression

Chapter 14. Linear least squares

LECTURE NOTES FYS 4550/FYS EXPERIMENTAL HIGH ENERGY PHYSICS AUTUMN 2013 PART I A. STRANDLIE GJØVIK UNIVERSITY COLLEGE AND UNIVERSITY OF OSLO

From Practical Data Analysis with JMP, Second Edition. Full book available for purchase here. About This Book... xiii About The Author...

SPSS Guide For MMI 409

STATISTICS SYLLABUS UNIT I

ECON Introductory Econometrics. Lecture 7: OLS with Multiple Regressors Hypotheses tests

Basic Statistical Analysis

Ma 3/103: Lecture 24 Linear Regression I: Estimation

Measuring relationships among multiple responses

Statistical Analysis of Engineering Data The Bare Bones Edition. Precision, Bias, Accuracy, Measures of Precision, Propagation of Error

Generalized, Linear, and Mixed Models

Linear Models 1. Isfahan University of Technology Fall Semester, 2014

Econometrics. 4) Statistical inference

16.400/453J Human Factors Engineering. Design of Experiments II

STATISTICS-STAT (STAT)

My data doesn t look like that..

Lecture 3. G. Cowan. Lecture 3 page 1. Lectures on Statistical Data Analysis

STAT 461/561- Assignments, Year 2015

Stat 5102 Final Exam May 14, 2015

Hypothesis Testing for Var-Cov Components

Chemometrics. Matti Hotokka Physical chemistry Åbo Akademi University

UQ, Semester 1, 2017, Companion to STAT2201/CIVL2530 Exam Formulae and Tables

Methods of statistical and numerical analysis (integrated course). Part I

Review of Statistics

STAT 501 EXAM I NAME Spring 1999

Linear Models in Statistics

If we want to analyze experimental or simulated data we might encounter the following tasks:

LISA Short Course Series Generalized Linear Models (GLMs) & Categorical Data Analysis (CDA) in R. Liang (Sally) Shan Nov. 4, 2014

EEC 686/785 Modeling & Performance Evaluation of Computer Systems. Lecture 19

Dr. Junchao Xia Center of Biophysics and Computational Biology. Fall /1/2016 1/46

Correlation and Linear Regression

Multiple Linear Regression

Nonparametric Bayesian Methods (Gaussian Processes)

Open Problems in Mixed Models

STAT 135 Lab 6 Duality of Hypothesis Testing and Confidence Intervals, GLRT, Pearson χ 2 Tests and Q-Q plots. March 8, 2015

Regression. Estimation of the linear function (straight line) describing the linear component of the joint relationship between two variables X and Y.

Contents LIST OF TABLES... LIST OF FIGURES... xvii. LIST OF LISTINGS... xxi PREFACE. ...xxiii

Transcription:

Name of the course Statistical methods and data analysis Audience The course is intended for students of the first or second year of the Graduate School in Materials Engineering. The aim of the course is to illustrate the basic mathematical tools for the analysis and modeling of experimental data, particularly concering the main statistical methods. The idea is not simply to describe and discuss the "cooking recipes" for the statistical treatment of data, but also to introduce and analyze in a critical way the statistical models implemented by the computation procedures, stressing their limitations and the actual interpretation of the results. Period of lectures From 17 January to (tentatively) 15 March 2012 The course consists in 18 lectures of 2 hours each, comprehensive of exercises. Conditions for obtaining credits For the first part, participants are required to pass a written examination consisting in the application of some statistical techniques (calculation of confidence intervals, hypothesis testing, calculation of correlation coefficients, linear regressions by chi-square or least squares fitting --- regression straight lines). For the topics of the second part, homework and possible oral discussion. Lectures and topics * Tuesday 17 January 2012 14:30-16:30 (2 hours) Recored on ESSE3 General introduction to the course. (1) Introduction to experimental measurements Errors of measurement. Systematic and random errors. Absolute and relative errors. Accuracy and precision of a measurement. Postulate of the statistical population; the result of a measurement as an outcome of a random variable. (2) Random variables Discrete and continuous random variables. Cumulative frequency theoretical distribution of a random variable. Frequency theoretical distribution (probability distribution) of a random variable, in the continuous and in the discrete case. Mean, variance, skewness of a probability distribution. Tchebyshev s theorem. Examples of discrete probability distributions: Bernoulli s, binomial, Poisson s. Examples of continuous probability distributions: uniform, normal or Gaussian, chi-square with n degrees of freedom (X 2 ), Student s with n degrees of freedom (t), Fisher s with n1 and n2 degrees of freedom (F). Central Limit Theorem. Multivariate random variables and related probability distributions;

(stochastically) dependent and independent random variables; covariance matrix; correlation matrix; uncorrelated random variables; stochastic independence as a sufficient but not necessary condition to the lack of correlation; * Wednesday 18 January 2012 14:30-16:30 (2 hours) Recorded on ESSE3 multivariate normal random variables, stochastic independence as a condition equivalent to the lack of correlation for multivariate normal random variables. (3) Functions of random variables Functions of random variables and indirect measurements. Linear combination of random variables: calculation of mean and variance, case of uncorrelated random variables; normal random variables, calculation of the joint probability distribution (the linear combination is again a normal random variable). Quadratic forms of standard normal random variables: Craig s theorem on the stochastic independence of positive semidefinite quadratic forms of independent standard normals; theorem for the characterization of quadratic forms of independent standard normals which obey a chi-square probability distribution. Illustrative examples. Error propagation in the indirect measurements: law of propagation of random errors in the indirect measurements (Gauss law) example of application of Gauss law; method of the logarithmic differential, principle of equivalent effects, example of application of the logarithmic differential; error propagation in the solution of a linear set of algebraic equations, condition number, example of application to the solution of a system of 2 equations in 2 variables; * Thursday 19 January 2012 14:30-16:30 (2 hours) Recorded on ESSE3 probability distribution of a function of random variables by Monte-Carlo methods. (4) Sample theory. Sample estimates of mean and variance Estimate of the parameters of the statistical population (mean and variance). Confidence intervals for the mean and the variance in the case of a normal population. Example of calculation of the CI for mean and variance of a normal population. * Tuesday 31 January 2012 14:30-16:30 (2 hours) Recorded on ESSE3 Remark on the probabilistic meaning of the confidence intervals. Calculation of the CIs for the mean and the variance by Excel and Maple. Large samples of an arbitrary statistical population: approximate CI for the mean, approximate CI for the variance, calculation of the critical values of a standard normal RV by Excel and Maple, example of calculation of the CI for the mean of a non-normal population by using a large sample, calculation of the CI for the mean of a non-normal population by Excel and Maple.

(5) Hypothesis testing Null hypothesis, type I errors, type II errors; Tests based on the rejection of the null hypothesis. * Wednesday 1 February 2012 14:30-16:30 (2 hours) Recorded on ESSE3 Example illustrating the general concepts of hypothesis testing. Summary of the main hypothesis tests and their typologies (parametric and non parametric). X 2 -test to fit a sample to a given probability distribution. Example of X 2 -test applied to a uniform distribution in the unit interval [0,1]. Application of the X 2 -test to a uniform distribution in [0,1] by Excel and Maple (same example as before). * Thursday 2 February 2012 14:30-16:30 (2 hours) Recorded on ESSE3 Example of X 2 -test applied to a normal distribution of unknown parameters. Application of the X 2 -test to a normal distribution of unknown parameters by Excel and Maple (same example as before). Kolmogorov-Smirnov test to fit a sample to a given continuous probability distribution. Quantitative and qualitative description of the Kolmogorov-Smirnov function by Maple. Example of KS test applied to a uniform distribution in the unit interval [0,1]. Remark on the calculation of the KS test statistics. Further example of KS test applied to a uniform distribution in the unit interval [0,1] by Excel. Example of KS test applied to a standard normal distribution. Application of the KS test to a standard normal distribution by Excel (same example as before). T-test to check if the mean of a normal population is equal to, smaller than or greater than an assigned value, respectively. Example of application of the t-test on the mean of a normal population, relation between the t-test on the mean and the CI for the mean. Application of the t-test on the mean of a normal population by Maple (same example as before). * Tuesday 7 February 2012 14:30-16:30 (2 hours) Recorded on ESSE3 X 2 -test to check whether the variance of a normal population is equal to, smaller than or greater than a given value, respectively. Example of application of the X 2 -test on the variance of a normal population, alternative form of the test based on the CI for the variance. Application of the X 2 -test on the variance of a normal population by Maple (same example as before). Z-test to compare the means of two independent normal populations of known variances. Example of application of the Z-test to compare the means of two independent normal populations of known variances. Application of the Z-test to compare the means of two independent normal populations of known variances by Excel and Maple (same example as before). F-test to compare the variances of two independent normal populations. Example of application of the F-test to compare the variances of two independent normal populations. Application of the F-test to compare the variances of two independent normal populations by Excel and Maple (on an example different from the previous one).

* Wednesday 8 February 2012 14:30-16:30 (2 hours) Recorded on ESSE3 Unpaired t-test to compare the means of two independent normal populations with unknown variances: case of equal variances; case of unequal variances. Example of application of the unpaired t-test for the comparison of the means of two normal populations (equal unknown variances). Application of the unpaired t-test for the comparison of the means of two normal populations with equal unknown variances by Excel and Maple (same example as before). Example of application of the unpaired t-test for the comparison of the means of two normal populations (unequal unknown variances). Application of the unpaired t-test for the comparison of the means of two normal populations with unequal unknown variances by Excel and Maple (same example as before). Paired t-test to compare the means of two normal populations of unknown variances. Example of application of the paired t-test to compare the means of two normal populations. Application of the paired t-test to compare the means of two normal populations by Excel and Maple (same example as before). A remark about the Excel implementation of the paired t-test: relation with Pearson s linear correlation coefficient. Sign test for the median of a population. Sign test to check if two independent samples belong to the same unknown statistical population. Example of application of the sign test to check if two samples belong to the same unknown population. Discussion of the previous example by using Excel and Maple. * Thursday 9 February 2012 14:30-16:30 (2 hours) Recorded on ESSE3 Detection of outliers non belonging to a statistical normal population by Chauvenet criterion. Calculation of Chauvenet critical values by Maple. Example of application of the Chauvenet criterion. (6) Pairs of random variables Pairs of random variables, covariance and linear correlation coefficient (Pearson r). Probability distributions for the linear correlation coefficient r and test to check the hypothesis of stochastic independence based on r: case of independent random variables with convergent moments of sufficiently high order, for sufficiently large samples (z-test); case of normal random variables - approximation for large samples, Fisher transformation (z-test), - independent normal random variables (t-test). Calculation of the linear correlation by Excel and Maple. Calculation of the sample estimate of the correlation matrix by Excel and Maple. Example of application of the linear correlation coefficient to check the stochastic independence of two normal random variables. Remark: linear correlation coefficient versus regression analysis. END OF THE FIRST PART OF THE COURSE

* Tuesday 21 February 2012 14:30-16:30 (2 hours) Recorded on ESSE3 A. F-test for the comparison of means. Introduction to ANOVA. Typical problem addressed by (1-factor) ANOVA and its matematical formulation. Factors and levels: 1-factor ANOVA, 2-factor ANOVA without replication, 2-factor ANOVA with replication, 2-factor ANOVA, etc. General mathematical formulation of ANOVA. A.1 1-factor ANOVA (one-way ANOVA) A.1.1 Basic definitions A.1.2 Statistical model of 1-factor ANOVA Fundamental theorem of 1-factor ANOVA and related remarks. F-test for the dependence of the means on the group. * Wednesday 22 February 2012 14:30-16:30 (2 hours) Recorded on ESSE3 Example of application of 1-factor ANOVA. Application of the 1-factor ANOVA by Excel and Maple (same example as before). Remark. 1-factor ANOVA for groups with unequal numbers of data. A.1.3 Confidence intervals for the difference of two means: Student s approach and Scheffé s approach. Another axample of application of 1-factor ANOVA, discussed by using the Excel and Maple tools, calculation of the confidence intervals for the difference of the means by Student s method and by Scheffé s estimate. A.2 2-factor ANOVA without interaction. A.2.1 Basic definitions of 2-factor ANOVA without interaction. A.2.2 Statistical model of 2-factor ANOVA without interaction. Fundamental theorem of 2-factor ANOVA without interaction and related remarks. F-test for the dependence of the means on the index i (first factor). F-test for the dependence of the means on the index j (second factor). * Thursday 23 February 2012 14:30-16:30 (2 hours) Recorded on ESSE3 Example of application of the 2-factor ANOVA without interaction. Application of the 2-factor ANOVA without interaction by Excel (same example as before). A.3 2-factor ANOVA with interaction A.3.1 Basic definitions of 2-factor ANOVA with interaction. A.3.2 Statistical model of 2-factor ANOVA with interaction. Fundamental theorem of 2-factor ANOVA with interaction and related remarks (hints). F-test for the dependence of the means on the index i (first factor). F-test for the dependence of the means on the index j (second factor). F-test for the occurrence of interactions between the factors. Example of application of the 2-factor ANOVA with interaction. Application of the 2-factor ANOVA with interaction by Excel (same example as before). * Wednesday 7 March 2012 14:30-16:30 (2 hours) Recorded on ESSE3 R. Data modeling. Introduction to regression analysis General setup of the problem: models, model parameters. Fitting of model parameters to the data of the small sample. Merit function. Calculation of the best values of the model parameters (best-fit). Maximum likelihood method for the definition of the merit function.

Some important cases: least-squares method; weighted least-squares method, or chi-square method; robust fitting methods for non-gaussian data, general notion of outlier. Linear regression by the chi-square method. Linear regression by the least-squares method as a particular case. Determination of the best-fit parameters by the normal equation. The best-fit parameters as normal random variables. Mean and covariance matrix of the best-fit parameters. The normalized sum of squares of residuals around the regression (NSSAR) as a chi-square random variable. Stochastic independence of the estimated parameters. Goodness of fit Q of the regression model. Extreme values of the goodness of fit Q. * Thursday 8 March 2012 14:30-16:30 (2 hours) Recorded on ESSE3 Case of equal and a priori unknown standard deviations, estimate of the standard deviations of the data by the chi-square method. F-test for the goodness of fit (in the presence of an additive parameter). F-test with repeated observations: case of unequal, known variances; case of equal, possibly unknown variances. Incremental F-test on the introduction of a further fitting parameter. * Friday 9 March 2012 9:00-11:00 (2 hours) Recorded on ESSE3 Self-consistency tests of the regression model for homoskedastic systems with unknown variance: coefficient of determination and adjusted coefficient of determination, line fit plot, residuals and residual plot, standardized residuals and normal probability plot. Confidence intervals for the regression parameters. Confidence intervals for predictions. Important case. Regression straight line: regression straight line, basic model; regression straight line, alternative model; confidence intervals for the regression parameters, basic model; confidence intervals for the regression parameters, alternative model; confidence interval for predictions, alternative model, confidence region; confidence interval for predictions, basic model. * Wednesday 14 March 2012 14:30-16:30 (2 hours) Recorded on ESSE3 Example of regression straight line in a homoscedastic model, confidence intervals for the intercept and the slope, confidence region, confidence interval for the prediction at a given value of the abscissa, goodness of fit for an assigned value of the variance. Maple implementation of the linear regression. Excel implementation of the linear regression. Excel implementation of the linear regression for the case of no additive parameter. Linear regression with more variables. Example of multiple linear regression.

Topics treated in the lecture notes: Chi-square fitting by SVD. Singular Value Decomposition of a matrix. SVD and covariance matrix of the estimated best-fit parameters. Example of SVD by Maple. * Thursday 15 March 2012 14:30-16:30 (2 hours) Recorded on ESSE3 N. Nonlinear regression N.1 Nonlinear models N.2 Maple implementation of nonlinear regression. Example of a nonlinear regression model reducible to a linear one. N.3 Example of a nonlinear regression model in one dependent variable. N.4 Example of a nonlinear regression model with two independent variables. N.5 Robust fitting. N.5.1 Estimate of the best-fit parameters of a model by local M-estimates. N.5.2 Excel and Maple implementation of robust fitting N.5.3 Example of robust regression straight line N.6 Multiple regression * Friday 16 March 2012 9:00-11:00 (2 hours) Recorded on ESSE3 Principal Component Analysis (PCA). Basic ideas. Principal component model. Principal component model for predictions. Relationship between PCA and SVD. Use of PCA for the grouping of data. PCA for nonb-centred data (General PCA). Implementation of PCA by R. Illustrative example of application of PCA. Further example of PCA.