Introduction to Statistical Hypothesis Testing

Size: px
Start display at page:

Download "Introduction to Statistical Hypothesis Testing"

Transcription

1 Introduction to Statistical Hypothesis Testing Arun K. Tangirala Statistics for Hypothesis Testing - Part 1 Arun K. Tangirala, IIT Madras Intro to Statistical Hypothesis Testing 1

2 Learning objectives I Sampling and Estimation I Statistics I Sample mean, variance, proportion and correlation. I Sampling distributions: concepts I Distribution of sample mean Arun K. Tangirala, IIT Madras Intro to Statistical Hypothesis Testing 2

3 Practical aspects and limitations of rigorous analysis Limitations of dealing with probability distributions I Sufficient process knowledge and the entire outcome space not easily available I Theoretical construction and estimation of f(x) is therefore hampered I Alternatives? I Available: Finite observations of phenomena I Can we say something about the ensemble behaviour? Arun K. Tangirala, IIT Madras Intro to Statistical Hypothesis Testing 3

4 From probability to statistics Probability Enables analysis of random phenomena and their expected behaviour using f(x). Moving from ensemble space to the finite observation space. I Essentially shifting from outcomes to observations I What kind of analysis is necessary? I Systematic and efficient way of extracting information I Importantly, what type of data and how should it be acquired? I Observations should contain sufficient" information about f(x) Statistical analysis provides answers to both questions Arun K. Tangirala, IIT Madras Intro to Statistical Hypothesis Testing 4

5 Central concepts 1. Population: Complete collection of all possibilities corresponding to a random experiment and characterized by the p.d.f. f(x ). 2. Samples: Specific set of observations obtained from a single experiment. Also known as a realization. 3. Inference: A conclusion made about the population from the data. I Finite data set only a representative of the population. Hence, every inference has a degree of uncertainty. Example: average solid propellant burning rate, defective parts Arun K. Tangirala, IIT Madras Intro to Statistical Hypothesis Testing 5

6 The three entities Statistics Aids in analysing random phenomena using finite data sets by either partly or fully reconstructing f(x). 1. Random variable, X: Actual variable(s) of interest 2. Observation set (data), {x i } n i=1: Available entity 3. Ensemble description, f(x): Not available. Has to be inferred or estimated from data. I We shall use the notation f(x ) to indicate its dependence on the parameter vector (e.g., =, h = µ 2i T ). Arun K. Tangirala, IIT Madras Intro to Statistical Hypothesis Testing 6

7 Important Assumption Finite observation set is representative of the population (informative data) - achieved through a proper design of experiment. Arun K. Tangirala, IIT Madras Intro to Statistical Hypothesis Testing 7

8 Statistical Analysis 1. Analysis of data (a.k.a. Descriptive Statistics) Exploratory Data Analysis i. Graphical: Organizing, presenting and summarizing data, e.g., Boxplots, ii. Numerical: Compute descriptive statistics, e.g., mean, median, range 2. Drawing inferences (a.k.a Inductive Statistics) Inferential Statistics i. Estimation: Determine unknowns (parameters) from data, e.g., mean, variance. ii. Hypothesis testing: Validating a postulation regarding f(x) or the parameters using the data as a source of evidence. Arun K. Tangirala, IIT Madras Intro to Statistical Hypothesis Testing 8

9 Sampling Fundamental to statistical analysis is the know-how of how to sample" and the effects of sampling on the estimates (not to be confused with sampling in signal processing). Sampling - a study of variability inherent in samples from population and sampling distribution - a study of the distributions of key statistics (mean, variance) Arun K. Tangirala, IIT Madras Intro to Statistical Hypothesis Testing 9

10 Three central concepts 1. Random sample: x = {x 1,x 2,,x n }. Essentially a single data record. I All subsets should have equal chances of being selected. No particular preference towards a section of the population! 2. Statistic: Function of the observation set, g(x), constructed for the purpose of estimating parameters. Sometimes denoted as ˆ (x) (estimator function). 3. Distribution of the statistic: How the statistic g(x) varies across samples or data records. Also known as the sampling distribution. Arun K. Tangirala, IIT Madras Intro to Statistical Hypothesis Testing 10

11 Random Sample Aim: Obtain a sample (subset) of n (n finite) observations from a population of size N (N could be infinite) Primary consideration: All subsets should have equal chances of being selected. No particular preference towards a section of the population! Definition Let X 1,X 2,,X n constitute n mutually, stochastically independent random variables, each of which stems from the same but possibly unknown pdf f(x). Then, this set of variables is a random sample. The joint pdf of the random sample above is then f(x 1,x 2,,x n )= ny f(x i ) i=1 Arun K. Tangirala, IIT Madras Intro to Statistical Hypothesis Testing 11

12 Statistic The purpose of collecting data is to make inferences by a suitable processing of the data. For this purpose, introduce a processor" function known as the statistic. Statistic Any function of the random sample with a p.d.f. f(x; ) ˆ = g(x 1,X 2,,X n ) and which does not require the knowledge of the unknown parameters of f(x) is said to be a statistic. Examples: (i) X = 1 nx X i (ii) Z =( X µ)/ (only if µ and are known). n i=1 Arun K. Tangirala, IIT Madras Intro to Statistical Hypothesis Testing 12

13 Points to note I Knowing g(.), one can construct the p.d.f. of ˆ, which will then tell us the probabilistic characteristics of the statistic I Although ˆ does not depend on the unknown parameters, its distribution does. I Distributions of statistics are in general known as sampling distributions. I Very useful for estimation and hypothesis testing. Arun K. Tangirala, IIT Madras Intro to Statistical Hypothesis Testing 13

14 Statistics of interest Three popular statistics in univariate analysis: 1. Sample mean: To provide an estimate of the true average µ. 2. Sample variance: As an estimate of the true variability of X, Sample proportion: As an estimation of the population proportion. and in multivariate analysis: 1. Sample correlation: Provides and estimate of the true correlation between two random variables X and Y (or the respective populations). Arun K. Tangirala, IIT Madras Intro to Statistical Hypothesis Testing 14

15 Estimation At the heart of any statistical analysis is an estimator. The role of the estimator is to produce an estimate given information and other user inputs. Two popular estimation problems: I Estimation of parameters of a distribution I Estimation of model parameters (regression) Subject is very broad - applies to all broad fields of engineering, medicine, econometrics, etc. Arun K. Tangirala, IIT Madras Intro to Statistical Hypothesis Testing 15

16 Post-estimation analysis Any estimate is a random variable and a function of the sample size. Statistical properties of the estimate qualify the goodness of an estimator: I Accuracy: How accurate is the estimate on the average? I Precision: What is the variability of the estimates obtained from different records? I Does the given estimator produce an estimate with the least variability? I What can we confidently say about the true value of (call it 0 ) from the obtained estimate? I Will the estimate converge (to the truth) as we increase the sample size? Arun K. Tangirala, IIT Madras Intro to Statistical Hypothesis Testing 16

Introduction to Statistical Hypothesis Testing

Introduction to Statistical Hypothesis Testing Introduction to Statistical Hypothesis Testing Arun K. Tangirala Power of Hypothesis Tests Arun K. Tangirala, IIT Madras Intro to Statistical Hypothesis Testing 1 Learning objectives I Computing Pr(Type

More information

Confidence intervals and Hypothesis testing

Confidence intervals and Hypothesis testing Confidence intervals and Hypothesis testing Confidence intervals offer a convenient way of testing hypothesis (all three forms). Procedure 1. Identify the parameter of interest.. Specify the significance

More information

System Identification

System Identification System Identification Arun K. Tangirala Department of Chemical Engineering IIT Madras July 27, 2013 Module 3 Lecture 1 Arun K. Tangirala System Identification July 27, 2013 1 Objectives of this Module

More information

Factors affecting the Type II error and Power of a test

Factors affecting the Type II error and Power of a test Factors affecting the Type II error and Power of a test The factors that affect the Type II error, and hence the power of a hypothesis test are 1. Deviation of truth from postulated value, 6= 0 2 2. Variability

More information

Probability and Inference. POLI 205 Doing Research in Politics. Populations and Samples. Probability. Fall 2015

Probability and Inference. POLI 205 Doing Research in Politics. Populations and Samples. Probability. Fall 2015 Fall 2015 Population versus Sample Population: data for every possible relevant case Sample: a subset of cases that is drawn from an underlying population Inference Parameters and Statistics A parameter

More information

Introduction to Statistical Hypothesis Testing

Introduction to Statistical Hypothesis Testing Introduction to Statistical Hypothesis Testing Arun K. Tangirala Hypothesis Testing of Variance and Proportions Arun K. Tangirala, IIT Madras Intro to Statistical Hypothesis Testing 1 Learning objectives

More information

Interpreting Regression Results

Interpreting Regression Results Interpreting Regression Results Carlo Favero Favero () Interpreting Regression Results 1 / 42 Interpreting Regression Results Interpreting regression results is not a simple exercise. We propose to split

More information

Monte Carlo Studies. The response in a Monte Carlo study is a random variable.

Monte Carlo Studies. The response in a Monte Carlo study is a random variable. Monte Carlo Studies The response in a Monte Carlo study is a random variable. The response in a Monte Carlo study has a variance that comes from the variance of the stochastic elements in the data-generating

More information

AIM HIGH SCHOOL. Curriculum Map W. 12 Mile Road Farmington Hills, MI (248)

AIM HIGH SCHOOL. Curriculum Map W. 12 Mile Road Farmington Hills, MI (248) AIM HIGH SCHOOL Curriculum Map 2923 W. 12 Mile Road Farmington Hills, MI 48334 (248) 702-6922 www.aimhighschool.com COURSE TITLE: Statistics DESCRIPTION OF COURSE: PREREQUISITES: Algebra 2 Students will

More information

BNG 495 Capstone Design. Descriptive Statistics

BNG 495 Capstone Design. Descriptive Statistics BNG 495 Capstone Design Descriptive Statistics Overview The overall goal of this short course in statistics is to provide an introduction to descriptive and inferential statistical methods, with a focus

More information

CONTENTS OF DAY 2. II. Why Random Sampling is Important 10 A myth, an urban legend, and the real reason NOTES FOR SUMMER STATISTICS INSTITUTE COURSE

CONTENTS OF DAY 2. II. Why Random Sampling is Important 10 A myth, an urban legend, and the real reason NOTES FOR SUMMER STATISTICS INSTITUTE COURSE 1 2 CONTENTS OF DAY 2 I. More Precise Definition of Simple Random Sample 3 Connection with independent random variables 4 Problems with small populations 9 II. Why Random Sampling is Important 10 A myth,

More information

Review of Statistics

Review of Statistics Review of Statistics Topics Descriptive Statistics Mean, Variance Probability Union event, joint event Random Variables Discrete and Continuous Distributions, Moments Two Random Variables Covariance and

More information

2) There should be uncertainty as to which outcome will occur before the procedure takes place.

2) There should be uncertainty as to which outcome will occur before the procedure takes place. robability Numbers For many statisticians the concept of the probability that an event occurs is ultimately rooted in the interpretation of an event as an outcome of an experiment, others would interpret

More information

Preliminary Statistics Lecture 2: Probability Theory (Outline) prelimsoas.webs.com

Preliminary Statistics Lecture 2: Probability Theory (Outline) prelimsoas.webs.com 1 School of Oriental and African Studies September 2015 Department of Economics Preliminary Statistics Lecture 2: Probability Theory (Outline) prelimsoas.webs.com Gujarati D. Basic Econometrics, Appendix

More information

LECTURE 5. Introduction to Econometrics. Hypothesis testing

LECTURE 5. Introduction to Econometrics. Hypothesis testing LECTURE 5 Introduction to Econometrics Hypothesis testing October 18, 2016 1 / 26 ON TODAY S LECTURE We are going to discuss how hypotheses about coefficients can be tested in regression models We will

More information

Statistical inference

Statistical inference Statistical inference Contents 1. Main definitions 2. Estimation 3. Testing L. Trapani MSc Induction - Statistical inference 1 1 Introduction: definition and preliminary theory In this chapter, we shall

More information

Introduction to Design of Experiments

Introduction to Design of Experiments Introduction to Design of Experiments Jean-Marc Vincent and Arnaud Legrand Laboratory ID-IMAG MESCAL Project Universities of Grenoble {Jean-Marc.Vincent,Arnaud.Legrand}@imag.fr November 20, 2011 J.-M.

More information

FRANKLIN UNIVERSITY PROFICIENCY EXAM (FUPE) STUDY GUIDE

FRANKLIN UNIVERSITY PROFICIENCY EXAM (FUPE) STUDY GUIDE FRANKLIN UNIVERSITY PROFICIENCY EXAM (FUPE) STUDY GUIDE Course Title: Probability and Statistics (MATH 80) Recommended Textbook(s): Number & Type of Questions: Probability and Statistics for Engineers

More information

Estimating trends using filters

Estimating trends using filters Estimating trends using filters... contd. 3. Exponential smoothing of data to estimate the trend m[k] ˆm[k] = v[k]+(1 )ˆm[k 1], k =2,, n ˆm[1] = v[1] The choice of has to be fine tuned according to the

More information

ENSC327 Communications Systems 19: Random Processes. Jie Liang School of Engineering Science Simon Fraser University

ENSC327 Communications Systems 19: Random Processes. Jie Liang School of Engineering Science Simon Fraser University ENSC327 Communications Systems 19: Random Processes Jie Liang School of Engineering Science Simon Fraser University 1 Outline Random processes Stationary random processes Autocorrelation of random processes

More information

Probability Theory for Machine Learning. Chris Cremer September 2015

Probability Theory for Machine Learning. Chris Cremer September 2015 Probability Theory for Machine Learning Chris Cremer September 2015 Outline Motivation Probability Definitions and Rules Probability Distributions MLE for Gaussian Parameter Estimation MLE and Least Squares

More information

Math 494: Mathematical Statistics

Math 494: Mathematical Statistics Math 494: Mathematical Statistics Instructor: Jimin Ding jmding@wustl.edu Department of Mathematics Washington University in St. Louis Class materials are available on course website (www.math.wustl.edu/

More information

Counting Statistics and Error Propagation!

Counting Statistics and Error Propagation! Counting Statistics and Error Propagation Nuclear Medicine Physics Lectures 10/4/11 Lawrence MacDonald, PhD macdon@uw.edu Imaging Research Laboratory, Radiology Dept. 1 Statistics Type of analysis which

More information

review session gov 2000 gov 2000 () review session 1 / 38

review session gov 2000 gov 2000 () review session 1 / 38 review session gov 2000 gov 2000 () review session 1 / 38 Overview Random Variables and Probability Univariate Statistics Bivariate Statistics Multivariate Statistics Causal Inference gov 2000 () review

More information

ACE 562 Fall Lecture 2: Probability, Random Variables and Distributions. by Professor Scott H. Irwin

ACE 562 Fall Lecture 2: Probability, Random Variables and Distributions. by Professor Scott H. Irwin ACE 562 Fall 2005 Lecture 2: Probability, Random Variables and Distributions Required Readings: by Professor Scott H. Irwin Griffiths, Hill and Judge. Some Basic Ideas: Statistical Concepts for Economists,

More information

PHYS 6710: Nuclear and Particle Physics II

PHYS 6710: Nuclear and Particle Physics II Data Analysis Content (~7 Classes) Uncertainties and errors Random variables, expectation value, (co)variance Distributions and samples Binomial, Poisson, and Normal Distribution Student's t-distribution

More information

BIOL 51A - Biostatistics 1 1. Lecture 1: Intro to Biostatistics. Smoking: hazardous? FEV (l) Smoke

BIOL 51A - Biostatistics 1 1. Lecture 1: Intro to Biostatistics. Smoking: hazardous? FEV (l) Smoke BIOL 51A - Biostatistics 1 1 Lecture 1: Intro to Biostatistics Smoking: hazardous? FEV (l) 1 2 3 4 5 No Yes Smoke BIOL 51A - Biostatistics 1 2 Box Plot a.k.a box-and-whisker diagram or candlestick chart

More information

MATH 450: Mathematical statistics

MATH 450: Mathematical statistics Departments of Mathematical Sciences University of Delaware August 28th, 2018 General information Classes: Tuesday & Thursday 9:30-10:45 am, Gore Hall 115 Office hours: Tuesday Wednesday 1-2:30 pm, Ewing

More information

1. Fundamental concepts

1. Fundamental concepts . Fundamental concepts A time series is a sequence of data points, measured typically at successive times spaced at uniform intervals. Time series are used in such fields as statistics, signal processing

More information

THE EFFECTS OF MULTICOLLINEARITY IN ORDINARY LEAST SQUARES (OLS) ESTIMATION

THE EFFECTS OF MULTICOLLINEARITY IN ORDINARY LEAST SQUARES (OLS) ESTIMATION THE EFFECTS OF MULTICOLLINEARITY IN ORDINARY LEAST SQUARES (OLS) ESTIMATION Weeraratne N.C. Department of Economics & Statistics SUSL, BelihulOya, Sri Lanka ABSTRACT The explanatory variables are not perfectly

More information

Stochastic calculus for summable processes 1

Stochastic calculus for summable processes 1 Stochastic calculus for summable processes 1 Lecture I Definition 1. Statistics is the science of collecting, organizing, summarizing and analyzing the information in order to draw conclusions. It is a

More information

T.I.H.E. IT 233 Statistics and Probability: Sem. 1: 2013 ESTIMATION AND HYPOTHESIS TESTING OF TWO POPULATIONS

T.I.H.E. IT 233 Statistics and Probability: Sem. 1: 2013 ESTIMATION AND HYPOTHESIS TESTING OF TWO POPULATIONS ESTIMATION AND HYPOTHESIS TESTING OF TWO POPULATIONS In our work on hypothesis testing, we used the value of a sample statistic to challenge an accepted value of a population parameter. We focused only

More information

The science of learning from data.

The science of learning from data. STATISTICS (PART 1) The science of learning from data. Numerical facts Collection of methods for planning experiments, obtaining data and organizing, analyzing, interpreting and drawing the conclusions

More information

CH5350: Applied Time-Series Analysis

CH5350: Applied Time-Series Analysis CH5350: Applied Time-Series Analysis Arun K. Tangirala Department of Chemical Engineering, IIT Madras Spectral Representations of Random Signals Arun K. Tangirala (IIT Madras) Applied Time-Series Analysis

More information

Introduction and Descriptive Statistics p. 1 Introduction to Statistics p. 3 Statistics, Science, and Observations p. 5 Populations and Samples p.

Introduction and Descriptive Statistics p. 1 Introduction to Statistics p. 3 Statistics, Science, and Observations p. 5 Populations and Samples p. Preface p. xi Introduction and Descriptive Statistics p. 1 Introduction to Statistics p. 3 Statistics, Science, and Observations p. 5 Populations and Samples p. 6 The Scientific Method and the Design of

More information

Practice Problems Section Problems

Practice Problems Section Problems Practice Problems Section 4-4-3 4-4 4-5 4-6 4-7 4-8 4-10 Supplemental Problems 4-1 to 4-9 4-13, 14, 15, 17, 19, 0 4-3, 34, 36, 38 4-47, 49, 5, 54, 55 4-59, 60, 63 4-66, 68, 69, 70, 74 4-79, 81, 84 4-85,

More information

Advanced analysis and modelling tools for spatial environmental data. Case study: indoor radon data in Switzerland

Advanced analysis and modelling tools for spatial environmental data. Case study: indoor radon data in Switzerland EnviroInfo 2004 (Geneva) Sh@ring EnviroInfo 2004 Advanced analysis and modelling tools for spatial environmental data. Case study: indoor radon data in Switzerland Mikhail Kanevski 1, Michel Maignan 1

More information

Contents. Acknowledgments. xix

Contents. Acknowledgments. xix Table of Preface Acknowledgments page xv xix 1 Introduction 1 The Role of the Computer in Data Analysis 1 Statistics: Descriptive and Inferential 2 Variables and Constants 3 The Measurement of Variables

More information

outline Nonlinear transformation Error measures Noisy targets Preambles to the theory

outline Nonlinear transformation Error measures Noisy targets Preambles to the theory Error and Noise outline Nonlinear transformation Error measures Noisy targets Preambles to the theory Linear is limited Data Hypothesis Linear in what? Linear regression implements Linear classification

More information

A General Overview of Parametric Estimation and Inference Techniques.

A General Overview of Parametric Estimation and Inference Techniques. A General Overview of Parametric Estimation and Inference Techniques. Moulinath Banerjee University of Michigan September 11, 2012 The object of statistical inference is to glean information about an underlying

More information

Statistics for Engineers Lecture 9 Linear Regression

Statistics for Engineers Lecture 9 Linear Regression Statistics for Engineers Lecture 9 Linear Regression Chong Ma Department of Statistics University of South Carolina chongm@email.sc.edu April 17, 2017 Chong Ma (Statistics, USC) STAT 509 Spring 2017 April

More information

PSY 305. Module 3. Page Title. Introduction to Hypothesis Testing Z-tests. Five steps in hypothesis testing

PSY 305. Module 3. Page Title. Introduction to Hypothesis Testing Z-tests. Five steps in hypothesis testing Page Title PSY 305 Module 3 Introduction to Hypothesis Testing Z-tests Five steps in hypothesis testing State the research and null hypothesis Determine characteristics of comparison distribution Five

More information

Lecture 1: Probability Fundamentals

Lecture 1: Probability Fundamentals Lecture 1: Probability Fundamentals IB Paper 7: Probability and Statistics Carl Edward Rasmussen Department of Engineering, University of Cambridge January 22nd, 2008 Rasmussen (CUED) Lecture 1: Probability

More information

MATH2206 Prob Stat/20.Jan Weekly Review 1-2

MATH2206 Prob Stat/20.Jan Weekly Review 1-2 MATH2206 Prob Stat/20.Jan.2017 Weekly Review 1-2 This week I explained the idea behind the formula of the well-known statistic standard deviation so that it is clear now why it is a measure of dispersion

More information

Probability and Statistics

Probability and Statistics Probability and Statistics Kristel Van Steen, PhD 2 Montefiore Institute - Systems and Modeling GIGA - Bioinformatics ULg kristel.vansteen@ulg.ac.be CHAPTER 4: IT IS ALL ABOUT DATA 4a - 1 CHAPTER 4: IT

More information

Statistische Methoden der Datenanalyse. Kapitel 1: Fundamentale Konzepte. Professor Markus Schumacher Freiburg / Sommersemester 2009

Statistische Methoden der Datenanalyse. Kapitel 1: Fundamentale Konzepte. Professor Markus Schumacher Freiburg / Sommersemester 2009 Prof. M. Schumacher Stat Meth. der Datenanalyse Kapi,1: Fundamentale Konzepten Uni. Freiburg / SoSe09 1 Statistische Methoden der Datenanalyse Kapitel 1: Fundamentale Konzepte Professor Markus Schumacher

More information

9/2/2010. Wildlife Management is a very quantitative field of study. throughout this course and throughout your career.

9/2/2010. Wildlife Management is a very quantitative field of study. throughout this course and throughout your career. Introduction to Data and Analysis Wildlife Management is a very quantitative field of study Results from studies will be used throughout this course and throughout your career. Sampling design influences

More information

Business Statistics. Lecture 9: Simple Regression

Business Statistics. Lecture 9: Simple Regression Business Statistics Lecture 9: Simple Regression 1 On to Model Building! Up to now, class was about descriptive and inferential statistics Numerical and graphical summaries of data Confidence intervals

More information

Treatment and analysis of data Applied statistics Lecture 4: Estimation

Treatment and analysis of data Applied statistics Lecture 4: Estimation Treatment and analysis of data Applied statistics Lecture 4: Estimation Topics covered: Hierarchy of estimation methods Modelling of data The likelihood function The Maximum Likelihood Estimate (MLE) Confidence

More information

Fourier and Stats / Astro Stats and Measurement : Stats Notes

Fourier and Stats / Astro Stats and Measurement : Stats Notes Fourier and Stats / Astro Stats and Measurement : Stats Notes Andy Lawrence, University of Edinburgh Autumn 2013 1 Probabilities, distributions, and errors Laplace once said Probability theory is nothing

More information

Part 4: Multi-parameter and normal models

Part 4: Multi-parameter and normal models Part 4: Multi-parameter and normal models 1 The normal model Perhaps the most useful (or utilized) probability model for data analysis is the normal distribution There are several reasons for this, e.g.,

More information

Probability and Information Theory. Sargur N. Srihari

Probability and Information Theory. Sargur N. Srihari Probability and Information Theory Sargur N. srihari@cedar.buffalo.edu 1 Topics in Probability and Information Theory Overview 1. Why Probability? 2. Random Variables 3. Probability Distributions 4. Marginal

More information

MA 575 Linear Models: Cedric E. Ginestet, Boston University Non-parametric Inference, Polynomial Regression Week 9, Lecture 2

MA 575 Linear Models: Cedric E. Ginestet, Boston University Non-parametric Inference, Polynomial Regression Week 9, Lecture 2 MA 575 Linear Models: Cedric E. Ginestet, Boston University Non-parametric Inference, Polynomial Regression Week 9, Lecture 2 1 Bootstrapped Bias and CIs Given a multiple regression model with mean and

More information

Primer on statistics:

Primer on statistics: Primer on statistics: MLE, Confidence Intervals, and Hypothesis Testing ryan.reece@gmail.com http://rreece.github.io/ Insight Data Science - AI Fellows Workshop Feb 16, 018 Outline 1. Maximum likelihood

More information

Lectures 5 & 6: Hypothesis Testing

Lectures 5 & 6: Hypothesis Testing Lectures 5 & 6: Hypothesis Testing in which you learn to apply the concept of statistical significance to OLS estimates, learn the concept of t values, how to use them in regression work and come across

More information

An introduction to biostatistics: part 1

An introduction to biostatistics: part 1 An introduction to biostatistics: part 1 Cavan Reilly September 6, 2017 Table of contents Introduction to data analysis Uncertainty Probability Conditional probability Random variables Discrete random

More information

1-1. Chapter 1. Sampling and Descriptive Statistics by The McGraw-Hill Companies, Inc. All rights reserved.

1-1. Chapter 1. Sampling and Descriptive Statistics by The McGraw-Hill Companies, Inc. All rights reserved. 1-1 Chapter 1 Sampling and Descriptive Statistics 1-2 Why Statistics? Deal with uncertainty in repeated scientific measurements Draw conclusions from data Design valid experiments and draw reliable conclusions

More information

Descriptive Univariate Statistics and Bivariate Correlation

Descriptive Univariate Statistics and Bivariate Correlation ESC 100 Exploring Engineering Descriptive Univariate Statistics and Bivariate Correlation Instructor: Sudhir Khetan, Ph.D. Wednesday/Friday, October 17/19, 2012 The Central Dogma of Statistics used to

More information

DEEP LEARNING CHAPTER 3 PROBABILITY & INFORMATION THEORY

DEEP LEARNING CHAPTER 3 PROBABILITY & INFORMATION THEORY DEEP LEARNING CHAPTER 3 PROBABILITY & INFORMATION THEORY OUTLINE 3.1 Why Probability? 3.2 Random Variables 3.3 Probability Distributions 3.4 Marginal Probability 3.5 Conditional Probability 3.6 The Chain

More information

Multiple Linear Regression for the Supervisor Data

Multiple Linear Regression for the Supervisor Data for the Supervisor Data Rating 40 50 60 70 80 90 40 50 60 70 50 60 70 80 90 40 60 80 40 60 80 Complaints Privileges 30 50 70 40 60 Learn Raises 50 70 50 70 90 Critical 40 50 60 70 80 30 40 50 60 70 80

More information

Northwestern University Department of Electrical Engineering and Computer Science

Northwestern University Department of Electrical Engineering and Computer Science Northwestern University Department of Electrical Engineering and Computer Science EECS 454: Modeling and Analysis of Communication Networks Spring 2008 Probability Review As discussed in Lecture 1, probability

More information

Statistical Distribution Assumptions of General Linear Models

Statistical Distribution Assumptions of General Linear Models Statistical Distribution Assumptions of General Linear Models Applied Multilevel Models for Cross Sectional Data Lecture 4 ICPSR Summer Workshop University of Colorado Boulder Lecture 4: Statistical Distributions

More information

Big Data Analysis with Apache Spark UC#BERKELEY

Big Data Analysis with Apache Spark UC#BERKELEY Big Data Analysis with Apache Spark UC#BERKELEY This Lecture: Relation between Variables An association A trend» Positive association or Negative association A pattern» Could be any discernible shape»

More information

Exploratory data analysis

Exploratory data analysis Exploratory data analysis November 29, 2017 Dr. Khajonpong Akkarajitsakul Department of Computer Engineering, Faculty of Engineering King Mongkut s University of Technology Thonburi Module III Overview

More information

0 0'0 2S ~~ Employment category

0 0'0 2S ~~ Employment category Analyze Phase 331 60000 50000 40000 30000 20000 10000 O~----,------.------,------,,------,------.------,----- N = 227 136 27 41 32 5 ' V~ 00 0' 00 00 i-.~ fl' ~G ~~ ~O~ ()0 -S 0 -S ~~ 0 ~~ 0 ~G d> ~0~

More information

2. A Basic Statistical Toolbox

2. A Basic Statistical Toolbox . A Basic Statistical Toolbo Statistics is a mathematical science pertaining to the collection, analysis, interpretation, and presentation of data. Wikipedia definition Mathematical statistics: concerned

More information

Probability and statistics; Rehearsal for pattern recognition

Probability and statistics; Rehearsal for pattern recognition Probability and statistics; Rehearsal for pattern recognition Václav Hlaváč Czech Technical University in Prague Czech Institute of Informatics, Robotics and Cybernetics 166 36 Prague 6, Jugoslávských

More information

Inferential Statistics

Inferential Statistics Inferential Statistics Part 1 Sampling Distributions, Point Estimates & Confidence Intervals Inferential statistics are used to draw inferences (make conclusions/judgements) about a population from a sample.

More information

Lecture # 1 - Introduction

Lecture # 1 - Introduction Lecture # 1 - Introduction Mathematical vs. Nonmathematical Economics Mathematical Economics is an approach to economic analysis Purpose of any approach: derive a set of conclusions or theorems Di erences:

More information

Machine Learning. Bayes Basics. Marc Toussaint U Stuttgart. Bayes, probabilities, Bayes theorem & examples

Machine Learning. Bayes Basics. Marc Toussaint U Stuttgart. Bayes, probabilities, Bayes theorem & examples Machine Learning Bayes Basics Bayes, probabilities, Bayes theorem & examples Marc Toussaint U Stuttgart So far: Basic regression & classification methods: Features + Loss + Regularization & CV All kinds

More information

Introduction to Probability and Statistics (Continued)

Introduction to Probability and Statistics (Continued) Introduction to Probability and Statistics (Continued) Prof. icholas Zabaras Center for Informatics and Computational Science https://cics.nd.edu/ University of otre Dame otre Dame, Indiana, USA Email:

More information

Lecture 1: Descriptive Statistics

Lecture 1: Descriptive Statistics Lecture 1: Descriptive Statistics MSU-STT-351-Sum 15 (P. Vellaisamy: MSU-STT-351-Sum 15) Probability & Statistics for Engineers 1 / 56 Contents 1 Introduction 2 Branches of Statistics Descriptive Statistics

More information

Chapter ML:IV. IV. Statistical Learning. Probability Basics Bayes Classification Maximum a-posteriori Hypotheses

Chapter ML:IV. IV. Statistical Learning. Probability Basics Bayes Classification Maximum a-posteriori Hypotheses Chapter ML:IV IV. Statistical Learning Probability Basics Bayes Classification Maximum a-posteriori Hypotheses ML:IV-1 Statistical Learning STEIN 2005-2017 Area Overview Mathematics Statistics...... Stochastics

More information

MA/ST 810 Mathematical-Statistical Modeling and Analysis of Complex Systems

MA/ST 810 Mathematical-Statistical Modeling and Analysis of Complex Systems MA/ST 810 Mathematical-Statistical Modeling and Analysis of Complex Systems Review of Basic Probability The fundamentals, random variables, probability distributions Probability mass/density functions

More information

Introduction. Chapter 1

Introduction. Chapter 1 Chapter 1 Introduction In this book we will be concerned with supervised learning, which is the problem of learning input-output mappings from empirical data (the training dataset). Depending on the characteristics

More information

Probability. Table of contents

Probability. Table of contents Probability Table of contents 1. Important definitions 2. Distributions 3. Discrete distributions 4. Continuous distributions 5. The Normal distribution 6. Multivariate random variables 7. Other continuous

More information

Gaussian Processes. Le Song. Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012

Gaussian Processes. Le Song. Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012 Gaussian Processes Le Song Machine Learning II: Advanced Topics CSE 8803ML, Spring 01 Pictorial view of embedding distribution Transform the entire distribution to expected features Feature space Feature

More information

Y i = η + ɛ i, i = 1,...,n.

Y i = η + ɛ i, i = 1,...,n. Nonparametric tests If data do not come from a normal population (and if the sample is not large), we cannot use a t-test. One useful approach to creating test statistics is through the use of rank statistics.

More information

Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen. Bayesian Learning. Tobias Scheffer, Niels Landwehr

Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen. Bayesian Learning. Tobias Scheffer, Niels Landwehr Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen Bayesian Learning Tobias Scheffer, Niels Landwehr Remember: Normal Distribution Distribution over x. Density function with parameters

More information

MATH 829: Introduction to Data Mining and Analysis Graphical Models I

MATH 829: Introduction to Data Mining and Analysis Graphical Models I MATH 829: Introduction to Data Mining and Analysis Graphical Models I Dominique Guillot Departments of Mathematical Sciences University of Delaware May 2, 2016 1/12 Independence and conditional independence:

More information

Probability. 25 th September lecture based on Hogg Tanis Zimmerman: Probability and Statistical Inference (9th ed.)

Probability. 25 th September lecture based on Hogg Tanis Zimmerman: Probability and Statistical Inference (9th ed.) Probability 25 th September 2017 lecture based on Hogg Tanis Zimmerman: Probability and Statistical Inference (9th ed.) Properties of Probability Methods of Enumeration Conditional Probability Independent

More information

401 Review. 6. Power analysis for one/two-sample hypothesis tests and for correlation analysis.

401 Review. 6. Power analysis for one/two-sample hypothesis tests and for correlation analysis. 401 Review Major topics of the course 1. Univariate analysis 2. Bivariate analysis 3. Simple linear regression 4. Linear algebra 5. Multiple regression analysis Major analysis methods 1. Graphical analysis

More information

A review of probability theory

A review of probability theory 1 A review of probability theory In this book we will study dynamical systems driven by noise. Noise is something that changes randomly with time, and quantities that do this are called stochastic processes.

More information

Introduction to Statistical Analysis

Introduction to Statistical Analysis Introduction to Statistical Analysis Changyu Shen Richard A. and Susan F. Smith Center for Outcomes Research in Cardiology Beth Israel Deaconess Medical Center Harvard Medical School Objectives Descriptive

More information

The t-distribution. Patrick Breheny. October 13. z tests The χ 2 -distribution The t-distribution Summary

The t-distribution. Patrick Breheny. October 13. z tests The χ 2 -distribution The t-distribution Summary Patrick Breheny October 13 Patrick Breheny Biostatistical Methods I (BIOS 5710) 1/25 Introduction Introduction What s wrong with z-tests? So far we ve (thoroughly!) discussed how to carry out hypothesis

More information

Review. DS GA 1002 Statistical and Mathematical Models. Carlos Fernandez-Granda

Review. DS GA 1002 Statistical and Mathematical Models.   Carlos Fernandez-Granda Review DS GA 1002 Statistical and Mathematical Models http://www.cims.nyu.edu/~cfgranda/pages/dsga1002_fall16 Carlos Fernandez-Granda Probability and statistics Probability: Framework for dealing with

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Computer Science! Department of Statistical Sciences! rsalakhu@cs.toronto.edu! h0p://www.cs.utoronto.ca/~rsalakhu/ Lecture 7 Approximate

More information

PROBABILITY THEORY 1. Basics

PROBABILITY THEORY 1. Basics PROILITY THEORY. asics Probability theory deals with the study of random phenomena, which under repeated experiments yield different outcomes that have certain underlying patterns about them. The notion

More information

Keller: Stats for Mgmt & Econ, 7th Ed July 17, 2006

Keller: Stats for Mgmt & Econ, 7th Ed July 17, 2006 Chapter 17 Simple Linear Regression and Correlation 17.1 Regression Analysis Our problem objective is to analyze the relationship between interval variables; regression analysis is the first tool we will

More information

PROBABILITY THEORY. Prof. S. J. Soni. Assistant Professor Computer Engg. Department SPCE, Visnagar

PROBABILITY THEORY. Prof. S. J. Soni. Assistant Professor Computer Engg. Department SPCE, Visnagar PROBABILITY THEORY By Prof. S. J. Soni Assistant Professor Computer Engg. Department SPCE, Visnagar Introduction Signals whose values at any instant t are determined by their analytical or graphical description

More information

FORMULATION OF THE LEARNING PROBLEM

FORMULATION OF THE LEARNING PROBLEM FORMULTION OF THE LERNING PROBLEM MIM RGINSKY Now that we have seen an informal statement of the learning problem, as well as acquired some technical tools in the form of concentration inequalities, we

More information

Summary of Chapters 7-9

Summary of Chapters 7-9 Summary of Chapters 7-9 Chapter 7. Interval Estimation 7.2. Confidence Intervals for Difference of Two Means Let X 1,, X n and Y 1, Y 2,, Y m be two independent random samples of sizes n and m from two

More information

Basic Data Analysis. Stephen Turnbull Business Administration and Public Policy Lecture 5: May 9, Abstract

Basic Data Analysis. Stephen Turnbull Business Administration and Public Policy Lecture 5: May 9, Abstract Basic Data Analysis Stephen Turnbull Business Administration and Public Policy Lecture 5: May 9, 2013 Abstract We discuss random variables and probability distributions. We introduce statistical inference,

More information

Statistics Statistical Process Control & Control Charting

Statistics Statistical Process Control & Control Charting Statistics Statistical Process Control & Control Charting Cayman Systems International 1/22/98 1 Recommended Statistical Course Attendance Basic Business Office, Staff, & Management Advanced Business Selected

More information

Examine characteristics of a sample and make inferences about the population

Examine characteristics of a sample and make inferences about the population Chapter 11 Introduction to Inferential Analysis Learning Objectives Understand inferential statistics Explain the difference between a population and a sample Explain the difference between parameter and

More information

Glossary. The ISI glossary of statistical terms provides definitions in a number of different languages:

Glossary. The ISI glossary of statistical terms provides definitions in a number of different languages: Glossary The ISI glossary of statistical terms provides definitions in a number of different languages: http://isi.cbs.nl/glossary/index.htm Adjusted r 2 Adjusted R squared measures the proportion of the

More information

Monte Carlo Simulation. CWR 6536 Stochastic Subsurface Hydrology

Monte Carlo Simulation. CWR 6536 Stochastic Subsurface Hydrology Monte Carlo Simulation CWR 6536 Stochastic Subsurface Hydrology Steps in Monte Carlo Simulation Create input sample space with known distribution, e.g. ensemble of all possible combinations of v, D, q,

More information

Lecture 9: Linear Regression

Lecture 9: Linear Regression Lecture 9: Linear Regression Goals Develop basic concepts of linear regression from a probabilistic framework Estimating parameters and hypothesis testing with linear models Linear regression in R Regression

More information

The regression model with one stochastic regressor.

The regression model with one stochastic regressor. The regression model with one stochastic regressor. 3150/4150 Lecture 6 Ragnar Nymoen 30 January 2012 We are now on Lecture topic 4 The main goal in this lecture is to extend the results of the regression

More information

An Introduction to Parameter Estimation

An Introduction to Parameter Estimation Introduction Introduction to Econometrics An Introduction to Parameter Estimation This document combines several important econometric foundations and corresponds to other documents such as the Introduction

More information