Some basic statistics and curve fitting techniques

Similar documents
Introduction to Regression

Chapter 11: Simple Linear Regression and Correlation

Statistics for Managers Using Microsoft Excel/SPSS Chapter 13 The Simple Linear Regression Model and Correlation

Laboratory 1c: Method of Least Squares

Statistics for Business and Economics

The Ordinary Least Squares (OLS) Estimator

Statistics for Economics & Business

PHYS 450 Spring semester Lecture 02: Dealing with Experimental Uncertainties. Ron Reifenberger Birck Nanotechnology Center Purdue University

Department of Quantitative Methods & Information Systems. Time Series and Their Components QMIS 320. Chapter 6

Chapter 3 Describing Data Using Numerical Measures

Statistical analysis using matlab. HY 439 Presented by: George Fortetsanakis

Statistics MINITAB - Lab 2

1. Inference on Regression Parameters a. Finding Mean, s.d and covariance amongst estimates. 2. Confidence Intervals and Working Hotelling Bands

Comparison of Regression Lines

Linear regression. Regression Models. Chapter 11 Student Lecture Notes Regression Analysis is the

Chapter 9: Statistical Inference and the Relationship between Two Variables

Econ107 Applied Econometrics Topic 3: Classical Model (Studenmund, Chapter 4)

Lecture 6: Introduction to Linear Regression

Laboratory 3: Method of Least Squares

ECONOMICS 351*-A Mid-Term Exam -- Fall Term 2000 Page 1 of 13 pages. QUEEN'S UNIVERSITY AT KINGSTON Department of Economics

Lecture 16 Statistical Analysis in Biomaterials Research (Part II)

Economics 130. Lecture 4 Simple Linear Regression Continued

/ n ) are compared. The logic is: if the two

Y = β 0 + β 1 X 1 + β 2 X β k X k + ε

The Multiple Classical Linear Regression Model (CLRM): Specification and Assumptions. 1. Introduction

Statistics Chapter 4

STAT 3008 Applied Regression Analysis

See Book Chapter 11 2 nd Edition (Chapter 10 1 st Edition)

U-Pb Geochronology Practical: Background

UNIVERSITY OF TORONTO Faculty of Arts and Science. December 2005 Examinations STA437H1F/STA1005HF. Duration - 3 hours

18. SIMPLE LINEAR REGRESSION III

STAT 511 FINAL EXAM NAME Spring 2001

28. SIMPLE LINEAR REGRESSION III

Lecture 9: Linear regression: centering, hypothesis testing, multiple covariates, and confounding

Lecture 9: Linear regression: centering, hypothesis testing, multiple covariates, and confounding

STAT 3340 Assignment 1 solutions. 1. Find the equation of the line which passes through the points (1,1) and (4,5).

Chapter 2 - The Simple Linear Regression Model S =0. e i is a random error. S β2 β. This is a minimization problem. Solution is a calculus exercise.

Dr. Shalabh Department of Mathematics and Statistics Indian Institute of Technology Kanpur

8/25/17. Data Modeling. Data Modeling. Data Modeling. Patrice Koehl Department of Biological Sciences National University of Singapore

Hydrological statistics. Hydrological statistics and extremes

Topic- 11 The Analysis of Variance

T E C O L O T E R E S E A R C H, I N C.

Lecture Notes for STATISTICAL METHODS FOR BUSINESS II BMGT 212. Chapters 14, 15 & 16. Professor Ahmadi, Ph.D. Department of Management

Basic Business Statistics, 10/e

β0 + β1xi and want to estimate the unknown

a. (All your answers should be in the letter!

Linear Correlation. Many research issues are pursued with nonexperimental studies that seek to establish relationships among 2 or more variables

e i is a random error

Biostatistics 360 F&t Tests and Intervals in Regression 1

Here is the rationale: If X and y have a strong positive relationship to one another, then ( x x) will tend to be positive when ( y y)

Answers Problem Set 2 Chem 314A Williamsen Spring 2000

[The following data appear in Wooldridge Q2.3.] The table below contains the ACT score and college GPA for eight college students.

Chapter 3. Two-Variable Regression Model: The Problem of Estimation

Midterm Examination. Regression and Forecasting Models

Lecture 3: Probability Distributions

Professor Chris Murray. Midterm Exam

Linear Approximation with Regularization and Moving Least Squares

SIMPLE LINEAR REGRESSION

Lecture Notes on Linear Regression

MACHINE APPLIED MACHINE LEARNING LEARNING. Gaussian Mixture Regression

THE ROYAL STATISTICAL SOCIETY 2006 EXAMINATIONS SOLUTIONS HIGHER CERTIFICATE

[ ] λ λ λ. Multicollinearity. multicollinearity Ragnar Frisch (1934) perfect exact. collinearity. multicollinearity. exact

Lecture 15 Statistical Analysis in Biomaterials Research

Negative Binomial Regression

Uncertainty as the Overlap of Alternate Conditional Distributions

MLE and Bayesian Estimation. Jie Tang Department of Computer Science & Technology Tsinghua University 2012

Systematic Error Illustration of Bias. Sources of Systematic Errors. Effects of Systematic Errors 9/23/2009. Instrument Errors Method Errors Personal

4 Analysis of Variance (ANOVA) 5 ANOVA. 5.1 Introduction. 5.2 Fixed Effects ANOVA

Feb 14: Spatial analysis of data fields

Learning Objectives for Chapter 11

Predictive Analytics : QM901.1x Prof U Dinesh Kumar, IIMB. All Rights Reserved, Indian Institute of Management Bangalore

Parametric fractional imputation for missing data analysis. Jae Kwang Kim Survey Working Group Seminar March 29, 2010

3.1 Expectation of Functions of Several Random Variables. )' be a k-dimensional discrete or continuous random vector, with joint PMF p (, E X E X1 E X

Basic Statistical Analysis and Yield Calculations

ANSWERS CHAPTER 9. TIO 9.2: If the values are the same, the difference is 0, therefore the null hypothesis cannot be rejected.

Chapter 4: Regression With One Regressor

CIS526: Machine Learning Lecture 3 (Sept 16, 2003) Linear Regression. Preparation help: Xiaoying Huang. x 1 θ 1 output... θ M x M

x yi In chapter 14, we want to perform inference (i.e. calculate confidence intervals and perform tests of significance) in this setting.

A REVIEW OF ERROR ANALYSIS

Polynomial Regression Models

Chapter 14 Simple Linear Regression

Linear Feature Engineering 11

Linear Regression Analysis: Terminology and Notation

Chapter 5 Multilevel Models

BOOTSTRAP METHOD FOR TESTING OF EQUALITY OF SEVERAL MEANS. M. Krishna Reddy, B. Naveen Kumar and Y. Ramu

Composite Hypotheses testing

x i1 =1 for all i (the constant ).

Regression Analysis. Regression Analysis

Durban Watson for Testing the Lack-of-Fit of Polynomial Regression Models without Replications

Activity #13: Simple Linear Regression. actgpa.sav; beer.sav;

Global Sensitivity. Tuesday 20 th February, 2018

Dr. Shalabh Department of Mathematics and Statistics Indian Institute of Technology Kanpur

Originated from experimental optimization where measurements are very noisy Approximation can be actually more accurate than

Topic 23 - Randomized Complete Block Designs (RCBD)

Statistics and Probability Theory in Civil, Surveying and Environmental Engineering

Introduction to Analysis of Variance (ANOVA) Part 1

First Year Examination Department of Statistics, University of Florida

LINEAR REGRESSION ANALYSIS. MODULE IX Lecture Multicollinearity

Resource Allocation and Decision Analysis (ECON 8010) Spring 2014 Foundations of Regression Analysis

Computation of Higher Order Moments from Two Multinomial Overdispersion Likelihood Models

Transcription:

Some basc statstcs and curve fttng technques Statstcs s the dscplne concerned wth the study of varablty, wth the study of uncertanty, and wth the study of decsonmakng n the face of uncertanty (Lndsay et al., 2004). Statstcs s the scence of collectng, organzng, analyzng and nterpretng data. Nomnal data categores that are not ordered (e.g. taxa). Ordnal data fts n categores that are ordered but level between orders has no objectve measure (e.g. pan level). Scale data fts n categores that are ordered wth unts measures between levels (e.g. unts such as m/s)

Why do we need statstcs? Statstcs helps to provde answers to questons such as: 1. What s the concentraton of plankton at the dock rght now (gven past measurements)? 2. Wll speces x be n the water tomorrow? We are nterested n the lkelhood of the answer and help reduce large datasets nto ther salent characterstcs. The use of statstcs to make a pont: 1. Statstcs never proves a pont (t says somethng about lkelhood). 2. If you need fancy statstc to support a pont, your pont s, at best, weak (Lazar, 1991, personal communcaton)

Why do we need statstcs? Populaton Realzaton Samplng Sample descrpton Parameters of populaton Inference Statstcs of sample

Statstcal descrpton of data Statstcal moments (1 st and 2 nd ): 1 Mean: x = N N å j= 1 N varance: ( ) 2 Var = x j 1 å N -1 j= 1 x j - x Standard devaton: s = Var Average devaton: Adev = 1 N N å j= 1 x j - x Standard error: s error = s N

Standard error: s error = s N When s the uncertanty not reduced by addtonal samplng?

Probablty dstrbuton: Statstcal descrpton of data

Non-normal probablty dstrbuton:

Statstcal descrpton of data Nonparametrc statstcs (when the dstrbuton s unknown): rank statstcs x, x,..., 2 xn 1,2,..., Medan 1 N percentle Devaton estmate The mode Issue: robustness, senstvty to outlers

Statstcal descrpton of data Robust: nsenstve to small departures form the dealzed assumptons for whch the estmator s optmzed. Press et al., 1992, Numercal recpe

Examples from COBOP, Lnkng varablty n IOPs to substrate: Statstcal descrpton of data Boss and Zaneveld, 2003 (L&O)

What do we care about n research to whch statstcs can contrbute? Relatonshps between varables (e.g. do we get blooms when nutrents are plentful?) Contrast between condtons (e.g. s datom vs. dnoflagellate domnaton assocated wth fresh water nput?).

Relatonshp between 2 varables Lnear correlaton: ( )( ) ( ) ( ) å å å - - - - = y y x x y y x x r 2 2 ( )( ) ( ) ( ) å å å - - - - = s S S R R S S R R r 2 2 Rank-order correlaton:

Relatonshp between 2 varables Same mean, Stdev, and r=0.816. Wlks, 2011

y = f(x) Regressons (models) Dependent and ndependent varables: Absorpton spectra. Tme seres of scatterng. What about chlorophyll vs. sze?

Uncertantes n y only: y ( x) 2 c = = ax + b å = 1: N Regressons of type I and type II æ y - a - ç è s bx ö ø 2 Mnmze c 2 by takng the dervatve of c 2 wrt a and b and equal t to zero. What f we have errors n both x and y? y ( x) 2 c = Var = ax + b å = 1: N ( y - ax - b) 2 2 2 ( y - ax - b) = s y + a s x s 2 y + 2 a s 2 2 x Mnmze c 2 by takng the dervatve of c 2 wrt a and b and equal t to zero.

R 2 = 1- MSE/Var(y). The coeffcent of determnaton MSE=mean square error=average error of model^2/varance. What varance does t explan? Can t reveal cause and effect? How s t affected by dynamc range? R s the correlaton coeffcent.

Regressons of type I and type II Classc type II approach (Rcker, 1973): The slope of the type II regresson s the geometerc mean of the slope of y vs. x and the nverse of the slope of x vs. y. y ( x) x( y) a II ± = = = = cy a sgn ax + b + c d = ± s { å x } y y s x

Flterng nosy sgnals. Smoothng of data What s nose? nstrumental (electronc) nose. Envronmental nose. one person s nose may be another person s sgnal Matlab: fltflt

Lab aggregaton exp.: Method of fluctuaton Sample volume Measurement tme Brggs et al., 2013

Modelng of data Condense/summarze data by fttng t to a model that depends on adjustable parameters. Example, CDM spectra: a g ~ l ( l) = a exp( - s( l - )) g 0 partculate attenuaton spectra: c ( l) = c~ p p æ ç è l l 0 ö ø -g

Example: CDM spectra. Mert functon: c a Þ a Modelng of data ( l )- a (- s( l - l )) 2 9 exp 2 éag g = ( l) = a exp( - s( l - )) å = 1 = ê ë ~ l [ ] a~, s g For non-lnear models, there s no guarantee to have a sngle mnmum. Need to provde an ntal guess. Matlab: fmnsearch g g ~ s 0 0 ù ú û

Modelng of data Lets assume that we have a model y = y( l;a) A more robust mert functon: N å ( l )- y( l ; ) ~ y a c = s = 1 Problem: dervatve s not contnuous. Can be used to ft lnes.

Statstcal descrpton of data Press et al., 1992

Monte-Carlo/Bootstrap methods Need to establsh confdence ntervals n: 1. Fttng-model parameters (e.g. CDM ft). 2. Model output (e.g. Hydrolght). n out

Bootstrap When there s an uncertanty (or possble error) assocated wth the nput: Vary nputs wth random errors and observe effect on output: n 1 out 1 n 2 out 2 n 3 out 3 n N out N

Bootstrap Example: how to assgn uncertantes n derved spectral slope of CDOM. Mert functon: 9 =1 χ 2 = a g λ ( ) ± Δ!a g exp s( λ λ 0 ) ( ( )) Randomly add uncertantes (D ) to each measurement, each tme performng the ft (e.g. usng randn.m n Matlab, RAND n Excel). Then do the stats for the dfferent s. 2

Summary Use statstcs logcally. If you don t know the underlyng dstrbuton use non-parametrc stats. Statstcs does not prove anythng but can gve you a sense of the lkelhood of a hypothess (about relatonshps). I strongly encourage you to study hypothess tests and Baysan methods. Beware that they are often msused