Model Fitting. Jean Yves Le Boudec

Similar documents
The Fundamentals of Heavy Tails Properties, Emergence, & Identification. Jayakrishnan Nair, Adam Wierman, Bert Zwart

An Architecture for a WWW Workload Generator. Paul Barford and Mark Crovella. Boston University. September 18, 1997

Distribution Fitting (Censored Data)

Lecture 3: Statistical sampling uncertainty

Business Statistics. Lecture 10: Course Review

APPENDIX 1 BASIC STATISTICS. Summarizing Data

Dr. Maddah ENMG 617 EM Statistics 10/15/12. Nonparametric Statistics (2) (Goodness of fit tests)

Review of Statistics

Practical Statistics for the Analytical Scientist Table of Contents

Math 180A. Lecture 16 Friday May 7 th. Expectation. Recall the three main probability density functions so far (1) Uniform (2) Exponential.

Introduction to Scientific Modeling CS 365, Fall 2012 Power Laws and Scaling

Median Cross-Validation

Exam C Solutions Spring 2005

Glossary. The ISI glossary of statistical terms provides definitions in a number of different languages:

Solutions. Some of the problems that might be encountered in collecting data on check-in times are:

Probability Plots. Summary. Sample StatFolio: probplots.sgp

Lecturer: Olga Galinina

ANALYSIS OF VARIANCE OF BALANCED DAIRY SCIENCE DATA USING SAS

Stat 101 L: Laboratory 5

Chapter 9. Non-Parametric Density Function Estimation

Network Traffic Characteristic

STAT 135 Lab 6 Duality of Hypothesis Testing and Confidence Intervals, GLRT, Pearson χ 2 Tests and Q-Q plots. March 8, 2015

Introduction to statistics

Simulation. Where real stuff starts

Review. DS GA 1002 Statistical and Mathematical Models. Carlos Fernandez-Granda

Capturing Network Traffic Dynamics Small Scales. Rolf Riedi

1 Degree distributions and data

Data Analysis. with Excel. An introduction for Physical scientists. LesKirkup university of Technology, Sydney CAMBRIDGE UNIVERSITY PRESS

Heavy Tails: The Origins and Implications for Large Scale Biological & Information Systems

SYSM 6303: Quantitative Introduction to Risk and Uncertainty in Business Lecture 4: Fitting Data to Distributions

Stat 5101 Lecture Notes

Diagnostics and Remedial Measures

GARCH Models Estimation and Inference

Financial Econometrics and Quantitative Risk Managenent Return Properties

PERFORMANCE EVALUATION

Chapter 9. Non-Parametric Density Function Estimation

Inferences for Regression

DETAILED CONTENTS PART I INTRODUCTION AND DESCRIPTIVE STATISTICS. 1. Introduction to Statistics

Preface Introduction to Statistics and Data Analysis Overview: Statistical Inference, Samples, Populations, and Experimental Design The Role of

Preview from Notesale.co.uk Page 3 of 63

Multivariate Regression

Least Absolute Value vs. Least Squares Estimation and Inference Procedures in Regression Models with Asymmetric Error Distributions

p(z)

Introduction to Statistical Analysis

TABLE OF CONTENTS CHAPTER 1 COMBINATORIAL PROBABILITY 1

Introduction to Linear regression analysis. Part 2. Model comparisons

Simulation. Where real stuff starts

EE/CpE 345. Modeling and Simulation. Fall Class 10 November 18, 2002

Bayesian Predictive Modeling for Exponential-Pareto Composite Distribution

* Tuesday 17 January :30-16:30 (2 hours) Recored on ESSE3 General introduction to the course.

K. Model Diagnostics. residuals ˆɛ ij = Y ij ˆµ i N = Y ij Ȳ i semi-studentized residuals ω ij = ˆɛ ij. studentized deleted residuals ɛ ij =

Kernel density estimation for heavy-tailed distributions...

A world-wide investigation of the probability distribution of daily rainfall

Institute of Actuaries of India

401 Review. 6. Power analysis for one/two-sample hypothesis tests and for correlation analysis.

STAT 6350 Analysis of Lifetime Data. Probability Plotting

Chapter 5. Statistical Models in Simulations 5.1. Prof. Dr. Mesut Güneş Ch. 5 Statistical Models in Simulations

Extreme Value Analysis and Spatial Extremes

INFERENCE FOR REGRESSION

Better Bootstrap Confidence Intervals

IE 303 Discrete-Event Simulation

Reliability Engineering I

TABLES AND FORMULAS FOR MOORE Basic Practice of Statistics

Parameter Estimation

MFin Econometrics I Session 4: t-distribution, Simple Linear Regression, OLS assumptions and properties of OLS estimators

Objective Experiments Glossary of Statistical Terms

Probability Distributions Columns (a) through (d)

Lecture 18: Simple Linear Regression

STAT 135 Lab 3 Asymptotic MLE and the Method of Moments

Convex Optimization M2

If we want to analyze experimental or simulated data we might encounter the following tasks:

Research Article The Laplace Likelihood Ratio Test for Heteroscedasticity

Robustness of Principal Components

C4-304 STATISTICS OF LIGHTNING OCCURRENCE AND LIGHTNING CURRENT S PARAMETERS OBTAINED THROUGH LIGHTNING LOCATION SYSTEMS

Correlation and Linear Regression

Estimators as Random Variables

ACTEX CAS EXAM 3 STUDY GUIDE FOR MATHEMATICAL STATISTICS

Linear Regression (9/11/13)

Contents. Preface to Second Edition Preface to First Edition Abbreviations PART I PRINCIPLES OF STATISTICAL THINKING AND ANALYSIS 1

Introduction to Logistic Regression

Statistical Distribution Assumptions of General Linear Models

Chapter 3 Common Families of Distributions

Generalized Linear Models for Non-Normal Data

HANDBOOK OF APPLICABLE MATHEMATICS

EE/CpE 345. Modeling and Simulation. Fall Class 9

BIOL 458 BIOMETRY Lab 9 - Correlation and Bivariate Regression

Post-exam 2 practice questions 18.05, Spring 2014

Final Exam. Name: Solution:

Formal Statement of Simple Linear Regression Model

Tables Table A Table B Table C Table D Table E 675

CS 365 Introduction to Scientific Modeling Fall Semester, 2011 Review

Statistics in medicine

LECTURE NOTE #3 PROF. ALAN YUILLE

Multistate Modeling and Applications

Probability Distributions for Continuous Variables. Probability Distributions for Continuous Variables

Towards a more physically based approach to Extreme Value Analysis in the climate system

Concepts and Applications of Kriging. Eric Krause

Chapter 14. Linear least squares

Ø Set of mutually exclusive categories. Ø Classify or categorize subject. Ø No meaningful order to categorization.

18.175: Lecture 8 Weak laws and moment-generating/characteristic functions

Transcription:

Model Fitting Jean Yves Le Boudec 0

Contents 1. What is model fitting? 2. Linear Regression 3. Linear regression with norm minimization 4. Choosing a distribution 5. Heavy Tail 1

Virus Infection Data We would like to capture the growth of infected hosts (explanatory model) An exponential model seems appropriate How can we fit the model, in particular, what is the value of? 2

Least Square Fit of Virus Infection Data = 0.5173 Mean doubling time 1.34 hours Prediction at +6 hours: 100 000 hosts Least square fit 3

Least Square Fit of Virus Infection Data In Log Scale = 0.39 Mean doubling time 1.77 hours Prediction at +6 hours: 39 000 hosts Least square fit 4

Compare the Two LS fit in natural scale LS fit in log scale 5

Which Fitting Method should I use? Which optimization criterion should I use? The answer is in a statistical model. Model not only the interesting part, but also the noise For example = 0.5173 6

= 0.39 How can I tell which is correct? 7

Look at Residuals = validate model 8

9

Least Square Fit = Gaussian iid Noise Assume model (homoscedasticity) The theorem says: minimize least squares = compute MLE for this model This is how we computed the estimates for the virus example 10

Least Square and Projection Data point Predicted response Manifold Where the data point would lie if there would be no noise Estimated parameter Skrivañ war an daol petra zo: data point, predicted response and estimated parameter for virus example 11

Confidence Intervals 12

13

Robustness to «Outliers» 14

A Simple Example Least Square Model: noise L1 Norm Minimization Model : noise What is m? What is m? Confidence interval? Confidence interval? 15

Mean Versus Median 16

Also called «ANOVA» (Analysis of Variance») 2. Linear Regression = least square + linear dependence on parameter A special case where computations are easy 17

Example 4.3 What is the parameter? Is it a linear model? How many degrees of freedom? What do we assume on i? What is the matrix X? 18

19

Does this model have full rank? 20

Some Terminology x i are called explanatory variable Assumed fixed and known y i are called response variables They are «the data» Assumed to be one sample output of the model 21

Least Square and Projection Data point Predicted response Manifold Where the data point would lie if there would be no noise Estimated parameter 22

Solution of the Linear Regression Model 23

Least Square and Projection The theorem gives H and K residuals data Predicted response Manifold Where the data point would lie if there would be no noise Estimated parameter 24

The Theorem Gives with Confidence Interval 25

SSR Confidence Intervals use the quantity s s 2 is called «Sum of Squared Residuals» residuals data Predicted response 26

Validate the Assumptions with Residuals 27

Residuals are given by the theorem Residuals residuals data Predicted response 28

Standardized Residuals The residuals e i are an estimate of the noise terms i They are not (exactly) normal iid The variance of e i is???? A: 1 H i,i Standardized residuals are not exactly normal iid either but their variance is 1 29

Which of these two models could be a linear regression model? A: both Linear regression does not mean that y i is a linear function of x i Achtung: There isa hidden assumption Noise is iid gaussian > homoscedasticity 30

31

3. Linear Regression with L1 norm = L1 norm minimization + linear dependency on parameter More robust Less traditional minimization 32

This is convex programming 33

34

Confidence Intervals No closed form Compare to median! Boostrap: How? 35

36

4. Choosing a Distribution Know a catalog of distributions, guess a fit Shape Kurtosis, Skewness Power laws Hazard Rate Fit Verify the fit visually or with a test (see later) 37

Distribution Shape Distributions have a shape By definition: the shape is what remains the same when we Shift Rescale Example: normal distribution: what is the shape parameter? Example: exponential distribution: what is the shape parameter? 38

Standard Distributions In a given catalog of distributions, we give only the distributions with different shapes. For each shape, we pick one particular distribution, which we call standard. Standard normal: N(0,1) Standard exponential: Exp(1) Standard Uniform: U(0,1) 39

Log Normal Distribution 40

41

Skewness and Curtosis 42

Power Laws and Pareto Distribution 43

Complementary Distribution Functions Log log Scales Lognormal Pareto Normal 44

Zipf s Law 45

46

Hazard Rate Interpretation: probability that a flow dies in next dt seconds given still alive Used to classify distribs Aging Memoriless Fat tail Ex: normal? Exponential? Pareto? Log Normal? 47

The Weibull Distribution Standard Weibull CDF: Aging for c > 1 Memoriless for c = 1 Fat tailed for c <1 48

Fitting A Distribution Assume iid Use maximum likelihood Ex: assume gaussian; what are parameters? Frequent issues Censoring Combinations 49

Censored Data We want to fit a log normal distrib, but we have only data samples with values less than some max Idea: use the model Lognormal is fat tailed so we cannot ignore the tail and estimate F0 and a (truncation threshold) 50

51

We want to fit a log normal distrib to the body and pareto to the tail Combinations Model: MLE satisfies 52

53

5. Heavy Tails Recall what fat tail is Heavier than fat: 54

Heavy Tail means Central Limit does not hold Central limit theorem: a sum of n independent random variables with finite second moment tends to have a normal distribution, when n is large explains why we can often use normal assumption But it does not always hold. It does not hold if random variables have infinite second moment. 55

Central Limit Theorem for Heavy Tails normal qqplot histogram complementary d.f. log-log One Sample of 10000 points Pareto p = 1 56

p=1 1 sample, 10000 points average of 1000 samples p=1.5 p=2 p=2.5 p=3 57

Convergence for heavy tailed distributions 58

Importance of Second Moment 59

RWP with Heavy Tail Stationary? 60

Evidence of Heavy Tail 61

Testing Heavy Tail Assume you have very large data set Else no statement can be made One can look at empirical cdf in log scale 62

Taqqu s method A better method (numerically safer is as follows). Aggregate data multiple times 63

We should have and If log ( m 2 / m 1 ) then measure p = / p est = average of all p s 64

Example log ( 2) log ( 2) / p 65

Evidence of Heavy Tail p = 1.08 ± 0.1 66

A Load Generator: Surge Designed to create load for a web server Used in next lab Sophisticated load model It is an example of a benchmark, there are many others see lecture 67

User Equivalent Model Idea: find a stochastice model that represents user well User modelled as sequence of downloads, followed by think time Tool can implement several user equivalents Used to generate real work over TCP connections 68

Characterization of UE Weibull dsitributions 69

Successive file requests are not independent Q: What would be the distribution if they were independent? A: geometric 70

Fitting the distributions Done by Surge authors with aest tool + ad hoc (least quare fit of histogram) What other method could one use? A: maximum likelihood with numerical optimization issue is non iid ness 71