Model Selection in Amplitude Analysis

Size: px
Start display at page:

Download "Model Selection in Amplitude Analysis"

Transcription

1 Model Selection in Amplitude Analysis Mike Williams Department of Physics & Laboratory for Nuclear Science Massachusetts Institute of Technology PWA/ATHOS April 6, 5

2 This talk summarizes a paper I worked on with B. Guegan, J. Hardin and J. Stevens. It should appear on the arxiv next week. Overview Everybody knows that one must truncate the set of waves considered. This is true in regression analyses performed in all fields of study. How do you do this? In physics the stepwise approach is by far the most common; however, general studies by statisticians show that these approaches tend to produce overly simplified models... is there a better way? Beyond this, if you want to discover a contribution it s desirable to quote a statistical significance. Is this possible? Follow up to: J.Stevens & MW [5.78]; MW [6.9] Preprint typeset in JINST style HYPER VERSION Model selection for amplitude analysis Baptiste Guegan, John Hardin, Justin Stevens and Mike Williams Massachusetts Institute of Technology, Cambridge, MA, United States ABSTRACT: Model complexity in amplitude analyses is often a priori underconstrained since the underlying theory permits a large number of amplitudes to contribute to most physical processes. The use of an overly complex model results in reduced predictive power and worse resolution on unknown parameters of interest. Therefore, it is common to arbitrarily reduce the complexity by removing from consideration some subset of the allowed amplitudes. This paper studies a datadriven method for limiting model complexity through regularization during regression in the context of a multivariate (Dalitzplot) analysis. The regularization technique applied greatly improves the performance. A method is also proposed for obtaining the significance of a resonance in a multivariate amplitude analysis. Mike Williams

3 This final state will first be studied in bins of m Toy Model abc with the fits performed in each bin being independent. The particles that decay into the abc final state are generically labeled as X. The decay amplitudes are constructed in the isobar formalism with the observed intensity written as the coherent sum of resonant terms plus a nonresonant term added incoherently as follows: Consider a toy model X abc in bins of m(abc) and start with simple 6wave true model. M (~x, ) = Â M Â i a i e if i A M i (~x, ) a nr, (.) where~x =(m a) b) c) 5 ab,m ac) represents the position in the Dalitz plot, ae if describes the unknown complex m ac 5 factor for each resonant component [S] of the model, and =. M denotes the sum over = spin. states of the [S] various X particles. Table [P] lists the properties and decay channels of the resonances contained in 5 8 [D] the true p.d.f. The resonant 6 [S] terms contain contributions from BlattWeisskopf barrier factors [8],.5.5 relativistic BreitWigner line shapes to describe the propagators, and spin factors obtained using 5 the Zemach formalism [9]. The amplitudes are evaluated using the qft package []..5.5 m ab m ac.5.5 m ab m ac.5 d) =. 8 6 m ac.5 e) =. 8 6 m ac.5 f) = m ab m ab m ab Figure. a) The X! abc mass distribution for one simulated data set overlaid with the contributions from Mike Williams

4 True PDF If we fit the data using only the waves in the model PDF things look good. 5 a) [S] [S] [P] [D] [S] [S] b) [D] c) [S] d) 6 5 [P] e) [S] f) [S] [S] g) h) [S] [D] i) [P] [S] Mike Williams Figure. Results of extended maximum likelihood fit with the true p.d.f. from Table : a) f) intensities for

5 Alternative PDF Fitting with an additional 5 extraneous waves introduces a lot of noise... 5 a) [S] [S] [P] [D] [S] [S] b) [D] c) [S] 8 6 d) [P] e) [S] f) [S] [S] g) h) [S] [D] i) [P] [S] Mike Williams 5 Figure. Results of extended maximum likelihood fit with the full p.d.f., including the additional 5 extra

6 Alternative PDF Fitting with an additional 5 extraneous waves introduces a lot of noise... 5 a) [S] 8 6 Fit Fraction [S] [S] [P] [D] [S] d) [S] b) [P] e) N [D] / c) [S] f) [S] Figure. Cumulative fraction of extraneous waves [P] with [S] fit fraction greater than yaxis value vs. The [S] g).5.5 h) [S] (red) line shows / p N, where N is the total number of events in the bin. This is shown just to give a sense of statistical precision in the bin. [D] i) Mike Williams 6

7 LASSO formance of the fits has clearly deteriorated. The variance of the estimators of the true model parameters has in many places greatly increased. In addition, the fits incorrectly determine that Intuitively some of the we 5know extraneous that: (a) amplitudes adding more (thoseparameters not in the true to the p.d.f) fit will make describe significant *this* contributions. data set better; This can (b) beadding seen intoo Fig. much, where complexity the cumulative to the model number actually of extraneous reduces resonant predictive amplitudes power and with a resolution fit fraction on above regression a given coefficients. threshold are How shown do vs we m balance these competitors? abc. Step Applying : Penalize the complexity LASSO to in this the analysis likelihood involves (we follow augmenting the LASSO the minimization in our approach): quantity according to Eq.. as follows: " r Z # Z logl l a i e if i A M i (~x) d~x a nr d~x, (.) Â i,m where, as before, l is the LASSO regularization parameter. Our choice of LASSO penalty term Overly complex models tend to use large interference effects to produce unphysical features does notin depend the PDF. on the We relative penalize normalization this using SUM[sqrt(fit of the A (~x) fraction)]. terms. Figure 5 shows the results of The fit fraction for resonant term i is defined as R a i e if ia i M (~x) d~x/ R M (~x) d~x. The parameter λ balances the fit quality gain with the complexity added. Larger values of λ will produce more simple models... but how do we choose λ? N.b., once λ is chosen performing this fit is no 6 different than a normal fit and the penalty term is constructed using only quantities readily available (simple to code up). Mike Williams 7

8 Information Criteria We consider the two most famous ICs: AIC = log(l) r and BIC = log(l) r*log(n(events)), where r = N(waves) with fit fraction > [threshold]. The choice of threshold doesn t seem to matter (anything in the.% range gives the same result for us). AIC BIC λ λ BIC produces models <= AIC. Generic recommendations: Try both in your systematic studies, i.e., consider AIC(λ) <= λ <= BIC(λ). Specific cases: AIC is better when a false negative would be more misleading than a false positive, and BIC is better where a false positive more misleading than a false negative. Mike Williams 8

9 LASSO Fit Fit including the 5 extraneous waves LASSO ~identical to using only true PDF waves. 5 a) [S] [S] [P] [D] [S] [S] b) [D] c) [S] d) 6 5 [P] e) 5 [S] 5 f) [S] [S] g) h) [S] [D] i) [P] [S] Figure 8. Results of extended maximum likelihood fit with the full p.d.f., including the additional 5 ex Mike Williams 9

10 False Waves The LASSO does a good job of killing extraneous waves: No LASSO LASSO Fit Fraction /.... N Fit Fraction /.... N raction of extraneous waves with fit fraction greater than yaxis value vs. The where N is the total number of events in the bin. This is shown just to give a sion in the bin. It is now very unlikely to have *any* extraneous wave with a fit fraction > /sqrt(n). Mike Williams

11 Bias? Does adding a penalty term to the likelihood bias the results towards smaller fit fractions? fit fractions fit/true [S] [S] [P] [D] [S].. true model traditional LASSO.8.6 All fit fractions for waves we want to measure are unbiased within the uncertainties and most are *less* biased than when not using the LASSO. Mike Williams

12 Bias? We also extract the resonance parameters of interest and find less bias using LASSO (and better resolution of course) [S] a) [S] b) Δφ [S] [S] c) traditional fit/true.. resonance mass [S] [P] [S] LASSO fit/true.. resonance width [S] [P] [S] Mike Williams

13 Model. The LASSO successfully picks out a simple true model, but the real world is that the true model has a lot of nonzero (but small) contributions and we only care about studying the properties of larger waves. Model.: Add 6 more waves to the true model with maximum fit fractions between..%. We don t really care about these, we just want to still make sure we properly measure the waves with fit fractions > %. fit/true.. fit fractions [S] [S] [P] [D] [S] true model.8 traditional LASSO.6 Mike Williams

14 Model. The LASSO successfully picks out a simple true model, but the real world is that the true model has a lot of nonzero (but small) contributions and we only care about studying the properties of larger waves. Model.: Add 6 more waves to the true model with maximum fit fractions between..%. We don t really care about these, we just want to still make sure we properly measure the waves with fit fractions > %. traditional resonance mass [S] [P] [S] LASSO resonance width [S] [P] [S] fit/true.. fit/true Mike Williams

15 Significance The LASSO permits selecting the wave set in a way that does not require human intervention. This means that we can now generate ensembles of data sets and run the procedure on them and determine the statistical significance of any predefined event. How to define a resonance candidate will be analysis dependent. As a simple example we chose: at least 6 consecutive m(abc) bins with the wave having a fit fraction > at > sigma in each bin *and* difference between the MIN and MAX phase difference (relative to a reference wave) of > π/. For all waves that satisfy this criteria we fit their strength and phase motion and get a Δchi relative to the chi obtained assuming random noise in the wave. Mike Williams 5

16 Model I Model II J P [Ml j] Generated No LASSO BICLASSO No LASSO BICLASSO [S].5.5 ±..5 ±..5 ±..5 ±. [P].5.9 ±..6 ±..9 ±..8 ±. [S].5. ±..8 ±..9 ±..7 ±. Significance Large waves will be significant no matter what but still interesting to see how much bigger Δchi is with the LASSO for some waves... Table 6. Mean and standard deviation of the measured G X distributions extracted from the BreitWigner fits. 5 5 a) [S] [S] [P] [D] [S] # datasets m ac no LASSO.5.5 # datasets c) =. 5 LASSO 5 m ac > X Fraction of datasets with Δχ m ab > X Fraction of datasets with Δχ Figure 6. Dc distribution for resonance amplitudes in the true p.d.f.: [S] (blue), [P] (cyan), and [S] (green); a) without using the LASSO procedure (i.e. l = ) and b) with the LASSO procedure 8 with(out) the LASSO. d) Any structure that survives e) the LASSO is unlikely f) to be a statistical 7 =. 8 artifact (doesn t mean it s a =. m =. 8 abc 6 using resonance BIC to selectof the best course). l for each data sample and m 6 abc mass bin. m ac 5 5 procedure.5 using BIC to select the best l for.5each data sample and mass bin. The mean of the Dc distribution is significantly larger for all the resonant amplitudes in the true p.d.f. when the LASSO procedure is used. To evaluate.5 the extraneous amplitudes, Fig 7 shows the cumulative maximum Dc from m ab m the extraneous amplitudes in the fit a) without the LASSO procedure ab and b) with the LASSO 5 procedure. Only % of the datasets fit with the LASSO procedure had any extraneous amplitudes 5 Figure. a) The X! abc mass distribution for one simulated data set overlaid with the contributions from Δχ Δχ the individual resonant amplitudes. b) LASSO f) Dalitz thatplot havedistributions a maximum Dc in afor fewany bins extraneous of. wave greater than. Without using the that satisfied the criteria for being assigned a nonzero Dc. There are no datasets fit using the Mike Williams 6 6 m ac Δχ m ab more interesting MAX(Δchi) of any extraneous wave shows σ requires ~7(5) 5 Δχ

17 Summary Wave set selection is an important part of amplitude analysis and (IMO) one that should be considered in an unbiased and rigorous way. The statistical community has been tackling this problem for decades and many widely used (outside of physics) solutions exist that are easy to implement and study. The use of methods that do not require human input is desirable as you get out what you put in and if what you put it in known resonances interesting wave you re likely to get a nonzero contribution for interesting wave. What does that mean? The use of the LASSO (A,B)IC permits an automated procedure that does not bias results (more than traditional approaches), gives resolution close to that of a cheat model, and permits defining a test statistic from which significance can be defined. Still need the human to interpret the results! Mike Williams 7

Linear regression methods

Linear regression methods Linear regression methods Most of our intuition about statistical methods stem from linear regression. For observations i = 1,..., n, the model is Y i = p X ij β j + ε i, j=1 where Y i is the response

More information

New Statistical Methods That Improve on MLE and GLM Including for Reserve Modeling GARY G VENTER

New Statistical Methods That Improve on MLE and GLM Including for Reserve Modeling GARY G VENTER New Statistical Methods That Improve on MLE and GLM Including for Reserve Modeling GARY G VENTER MLE Going the Way of the Buggy Whip Used to be gold standard of statistical estimation Minimum variance

More information

Summary and discussion of: Controlling the False Discovery Rate via Knockoffs

Summary and discussion of: Controlling the False Discovery Rate via Knockoffs Summary and discussion of: Controlling the False Discovery Rate via Knockoffs Statistics Journal Club, 36-825 Sangwon Justin Hyun and William Willie Neiswanger 1 Paper Summary 1.1 Quick intuitive summary

More information

Statistics 262: Intermediate Biostatistics Model selection

Statistics 262: Intermediate Biostatistics Model selection Statistics 262: Intermediate Biostatistics Model selection Jonathan Taylor & Kristin Cobb Statistics 262: Intermediate Biostatistics p.1/?? Today s class Model selection. Strategies for model selection.

More information

Introduction to Statistical modeling: handout for Math 489/583

Introduction to Statistical modeling: handout for Math 489/583 Introduction to Statistical modeling: handout for Math 489/583 Statistical modeling occurs when we are trying to model some data using statistical tools. From the start, we recognize that no model is perfect

More information

Generalized Partial Wave Analysis Software for PANDA

Generalized Partial Wave Analysis Software for PANDA Generalized Partial Wave Analysis Software for PANDA 39. International Workshop on the Gross Properties of Nuclei and Nuclear Excitations The Structure and Dynamics of Hadrons Hirschegg, January 2011 Klaus

More information

Physics 6720 Introduction to Statistics April 4, 2017

Physics 6720 Introduction to Statistics April 4, 2017 Physics 6720 Introduction to Statistics April 4, 2017 1 Statistics of Counting Often an experiment yields a result that can be classified according to a set of discrete events, giving rise to an integer

More information

LECTURE 10: LINEAR MODEL SELECTION PT. 1. October 16, 2017 SDS 293: Machine Learning

LECTURE 10: LINEAR MODEL SELECTION PT. 1. October 16, 2017 SDS 293: Machine Learning LECTURE 10: LINEAR MODEL SELECTION PT. 1 October 16, 2017 SDS 293: Machine Learning Outline Model selection: alternatives to least-squares Subset selection - Best subset - Stepwise selection (forward and

More information

- a value calculated or derived from the data.

- a value calculated or derived from the data. Descriptive statistics: Note: I'm assuming you know some basics. If you don't, please read chapter 1 on your own. It's pretty easy material, and it gives you a good background as to why we need statistics.

More information

UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2013

UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2013 UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2013 Exam policy: This exam allows two one-page, two-sided cheat sheets; No other materials. Time: 2 hours. Be sure to write your name and

More information

Fourier and Stats / Astro Stats and Measurement : Stats Notes

Fourier and Stats / Astro Stats and Measurement : Stats Notes Fourier and Stats / Astro Stats and Measurement : Stats Notes Andy Lawrence, University of Edinburgh Autumn 2013 1 Probabilities, distributions, and errors Laplace once said Probability theory is nothing

More information

How the mean changes depends on the other variable. Plots can show what s happening...

How the mean changes depends on the other variable. Plots can show what s happening... Chapter 8 (continued) Section 8.2: Interaction models An interaction model includes one or several cross-product terms. Example: two predictors Y i = β 0 + β 1 x i1 + β 2 x i2 + β 12 x i1 x i2 + ɛ i. How

More information

Dalitz Plot Analyses of B D + π π, B + π + π π + and D + s π+ π π + at BABAR

Dalitz Plot Analyses of B D + π π, B + π + π π + and D + s π+ π π + at BABAR Proceedings of the DPF-9 Conference, Detroit, MI, July 7-3, 9 SLAC-PUB-98 Dalitz Plot Analyses of B D + π π, B + π + π π + and D + s π+ π π + at BABAR Liaoyuan Dong (On behalf of the BABAR Collaboration

More information

Ways to make neural networks generalize better

Ways to make neural networks generalize better Ways to make neural networks generalize better Seminar in Deep Learning University of Tartu 04 / 10 / 2014 Pihel Saatmann Topics Overview of ways to improve generalization Limiting the size of the weights

More information

Statistical Data Analysis Stat 3: p-values, parameter estimation

Statistical Data Analysis Stat 3: p-values, parameter estimation Statistical Data Analysis Stat 3: p-values, parameter estimation London Postgraduate Lectures on Particle Physics; University of London MSci course PH4515 Glen Cowan Physics Department Royal Holloway,

More information

How Good Are Your Fits?

How Good Are Your Fits? How Good Are Your Fits? LHCb Collaboration::Mike Williams Imperial College London & MIT ATHOS 2012 June 21 st, 2012 Mike Williams Imperial College ATHOS, June 2012 1 / 10 Introduction A common situation

More information

ST440/540: Applied Bayesian Statistics. (9) Model selection and goodness-of-fit checks

ST440/540: Applied Bayesian Statistics. (9) Model selection and goodness-of-fit checks (9) Model selection and goodness-of-fit checks Objectives In this module we will study methods for model comparisons and checking for model adequacy For model comparisons there are a finite number of candidate

More information

Machine Learning CSE546 Sham Kakade University of Washington. Oct 4, What about continuous variables?

Machine Learning CSE546 Sham Kakade University of Washington. Oct 4, What about continuous variables? Linear Regression Machine Learning CSE546 Sham Kakade University of Washington Oct 4, 2016 1 What about continuous variables? Billionaire says: If I am measuring a continuous variable, what can you do

More information

Statistics 203: Introduction to Regression and Analysis of Variance Penalized models

Statistics 203: Introduction to Regression and Analysis of Variance Penalized models Statistics 203: Introduction to Regression and Analysis of Variance Penalized models Jonathan Taylor - p. 1/15 Today s class Bias-Variance tradeoff. Penalized regression. Cross-validation. - p. 2/15 Bias-variance

More information

Statistics and Data Analysis

Statistics and Data Analysis Statistics and Data Analysis The Crash Course Physics 226, Fall 2013 "There are three kinds of lies: lies, damned lies, and statistics. Mark Twain, allegedly after Benjamin Disraeli Statistics and Data

More information

Parameter Norm Penalties. Sargur N. Srihari

Parameter Norm Penalties. Sargur N. Srihari Parameter Norm Penalties Sargur N. srihari@cedar.buffalo.edu 1 Regularization Strategies 1. Parameter Norm Penalties 2. Norm Penalties as Constrained Optimization 3. Regularization and Underconstrained

More information

Lecture 3: Introduction to Complexity Regularization

Lecture 3: Introduction to Complexity Regularization ECE90 Spring 2007 Statistical Learning Theory Instructor: R. Nowak Lecture 3: Introduction to Complexity Regularization We ended the previous lecture with a brief discussion of overfitting. Recall that,

More information

Random Number Generation. CS1538: Introduction to simulations

Random Number Generation. CS1538: Introduction to simulations Random Number Generation CS1538: Introduction to simulations Random Numbers Stochastic simulations require random data True random data cannot come from an algorithm We must obtain it from some process

More information

ISyE 691 Data mining and analytics

ISyE 691 Data mining and analytics ISyE 691 Data mining and analytics Regression Instructor: Prof. Kaibo Liu Department of Industrial and Systems Engineering UW-Madison Email: kliu8@wisc.edu Office: Room 3017 (Mechanical Engineering Building)

More information

CSE446: Linear Regression Regulariza5on Bias / Variance Tradeoff Winter 2015

CSE446: Linear Regression Regulariza5on Bias / Variance Tradeoff Winter 2015 CSE446: Linear Regression Regulariza5on Bias / Variance Tradeoff Winter 2015 Luke ZeElemoyer Slides adapted from Carlos Guestrin Predic5on of con5nuous variables Billionaire says: Wait, that s not what

More information

Linear Model Selection and Regularization

Linear Model Selection and Regularization Linear Model Selection and Regularization Recall the linear model Y = β 0 + β 1 X 1 + + β p X p + ɛ. In the lectures that follow, we consider some approaches for extending the linear model framework. In

More information

Bias-Variance Tradeoff

Bias-Variance Tradeoff What s learning, revisited Overfitting Generative versus Discriminative Logistic Regression Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University September 19 th, 2007 Bias-Variance Tradeoff

More information

Post-Selection Inference

Post-Selection Inference Classical Inference start end start Post-Selection Inference selected end model data inference data selection model data inference Post-Selection Inference Todd Kuffner Washington University in St. Louis

More information

Machine Learning Linear Regression. Prof. Matteo Matteucci

Machine Learning Linear Regression. Prof. Matteo Matteucci Machine Learning Linear Regression Prof. Matteo Matteucci Outline 2 o Simple Linear Regression Model Least Squares Fit Measures of Fit Inference in Regression o Multi Variate Regession Model Least Squares

More information

Chapter 3: Regression Methods for Trends

Chapter 3: Regression Methods for Trends Chapter 3: Regression Methods for Trends Time series exhibiting trends over time have a mean function that is some simple function (not necessarily constant) of time. The example random walk graph from

More information

Sparse Linear Models (10/7/13)

Sparse Linear Models (10/7/13) STA56: Probabilistic machine learning Sparse Linear Models (0/7/) Lecturer: Barbara Engelhardt Scribes: Jiaji Huang, Xin Jiang, Albert Oh Sparsity Sparsity has been a hot topic in statistics and machine

More information

Regression I: Mean Squared Error and Measuring Quality of Fit

Regression I: Mean Squared Error and Measuring Quality of Fit Regression I: Mean Squared Error and Measuring Quality of Fit -Applied Multivariate Analysis- Lecturer: Darren Homrighausen, PhD 1 The Setup Suppose there is a scientific problem we are interested in solving

More information

CPSC 340: Machine Learning and Data Mining. More PCA Fall 2017

CPSC 340: Machine Learning and Data Mining. More PCA Fall 2017 CPSC 340: Machine Learning and Data Mining More PCA Fall 2017 Admin Assignment 4: Due Friday of next week. No class Monday due to holiday. There will be tutorials next week on MAP/PCA (except Monday).

More information

Statistics for the LHC Lecture 2: Discovery

Statistics for the LHC Lecture 2: Discovery Statistics for the LHC Lecture 2: Discovery Academic Training Lectures CERN, 14 17 June, 2010 indico.cern.ch/conferencedisplay.py?confid=77830 Glen Cowan Physics Department Royal Holloway, University of

More information

All models are wrong but some are useful. George Box (1979)

All models are wrong but some are useful. George Box (1979) All models are wrong but some are useful. George Box (1979) The problem of model selection is overrun by a serious difficulty: even if a criterion could be settled on to determine optimality, it is hard

More information

10. Alternative case influence statistics

10. Alternative case influence statistics 10. Alternative case influence statistics a. Alternative to D i : dffits i (and others) b. Alternative to studres i : externally-studentized residual c. Suggestion: use whatever is convenient with the

More information

High-dimensional regression

High-dimensional regression High-dimensional regression Advanced Methods for Data Analysis 36-402/36-608) Spring 2014 1 Back to linear regression 1.1 Shortcomings Suppose that we are given outcome measurements y 1,... y n R, and

More information

The exam is closed book, closed notes except your one-page (two sides) or two-page (one side) crib sheet.

The exam is closed book, closed notes except your one-page (two sides) or two-page (one side) crib sheet. CS 189 Spring 013 Introduction to Machine Learning Final You have 3 hours for the exam. The exam is closed book, closed notes except your one-page (two sides) or two-page (one side) crib sheet. Please

More information

Lecture 3: Just a little more math

Lecture 3: Just a little more math Lecture 3: Just a little more math Last time Through simple algebra and some facts about sums of normal random variables, we derived some basic results about orthogonal regression We used as our major

More information

Classification and Regression Trees

Classification and Regression Trees Classification and Regression Trees Ryan P Adams So far, we have primarily examined linear classifiers and regressors, and considered several different ways to train them When we ve found the linearity

More information

CPSC 340: Machine Learning and Data Mining. Feature Selection Fall 2018

CPSC 340: Machine Learning and Data Mining. Feature Selection Fall 2018 CPSC 340: Machine Learning and Data Mining Feature Selection Fall 2018 Admin Assignment 3 is due tonight. You can use a late day to submit up to 48 hours late. Solutions will be posted Monday. Midterm

More information

Machine Learning CSE546 Carlos Guestrin University of Washington. September 30, 2013

Machine Learning CSE546 Carlos Guestrin University of Washington. September 30, 2013 Bayesian Methods Machine Learning CSE546 Carlos Guestrin University of Washington September 30, 2013 1 What about prior n Billionaire says: Wait, I know that the thumbtack is close to 50-50. What can you

More information

E. Santovetti lesson 4 Maximum likelihood Interval estimation

E. Santovetti lesson 4 Maximum likelihood Interval estimation E. Santovetti lesson 4 Maximum likelihood Interval estimation 1 Extended Maximum Likelihood Sometimes the number of total events measurements of the experiment n is not fixed, but, for example, is a Poisson

More information

Lecture 3. G. Cowan. Lecture 3 page 1. Lectures on Statistical Data Analysis

Lecture 3. G. Cowan. Lecture 3 page 1. Lectures on Statistical Data Analysis Lecture 3 1 Probability (90 min.) Definition, Bayes theorem, probability densities and their properties, catalogue of pdfs, Monte Carlo 2 Statistical tests (90 min.) general concepts, test statistics,

More information

Why is the field of statistics still an active one?

Why is the field of statistics still an active one? Why is the field of statistics still an active one? It s obvious that one needs statistics: to describe experimental data in a compact way, to compare datasets, to ask whether data are consistent with

More information

Practice Problems Section Problems

Practice Problems Section Problems Practice Problems Section 4-4-3 4-4 4-5 4-6 4-7 4-8 4-10 Supplemental Problems 4-1 to 4-9 4-13, 14, 15, 17, 19, 0 4-3, 34, 36, 38 4-47, 49, 5, 54, 55 4-59, 60, 63 4-66, 68, 69, 70, 74 4-79, 81, 84 4-85,

More information

Hypothesis testing, part 2. With some material from Howard Seltman, Blase Ur, Bilge Mutlu, Vibha Sazawal

Hypothesis testing, part 2. With some material from Howard Seltman, Blase Ur, Bilge Mutlu, Vibha Sazawal Hypothesis testing, part 2 With some material from Howard Seltman, Blase Ur, Bilge Mutlu, Vibha Sazawal 1 CATEGORICAL IV, NUMERIC DV 2 Independent samples, one IV # Conditions Normal/Parametric Non-parametric

More information

Modern Methods of Data Analysis - WS 07/08

Modern Methods of Data Analysis - WS 07/08 Modern Methods of Data Analysis Lecture VII (26.11.07) Contents: Maximum Likelihood (II) Exercise: Quality of Estimators Assume hight of students is Gaussian distributed. You measure the size of N students.

More information

- measures the center of our distribution. In the case of a sample, it s given by: y i. y = where n = sample size.

- measures the center of our distribution. In the case of a sample, it s given by: y i. y = where n = sample size. Descriptive Statistics: One of the most important things we can do is to describe our data. Some of this can be done graphically (you should be familiar with histograms, boxplots, scatter plots and so

More information

Applied Time Series Topics

Applied Time Series Topics Applied Time Series Topics Ivan Medovikov Brock University April 16, 2013 Ivan Medovikov, Brock University Applied Time Series Topics 1/34 Overview 1. Non-stationary data and consequences 2. Trends and

More information

NON-TRIVIAL APPLICATIONS OF BOOSTING:

NON-TRIVIAL APPLICATIONS OF BOOSTING: NON-TRIVIAL APPLICATIONS OF BOOSTING: REWEIGHTING DISTRIBUTIONS WITH BDT, BOOSTING TO UNIFORMITY Alex Rogozhnikov YSDA, NRU HSE Heavy Flavour Data Mining workshop, Zurich, 2016 1 BOOSTING a family of machine

More information

Regression Retrieval Overview. Larry McMillin

Regression Retrieval Overview. Larry McMillin Regression Retrieval Overview Larry McMillin Climate Research and Applications Division National Environmental Satellite, Data, and Information Service Washington, D.C. Larry.McMillin@noaa.gov Pick one

More information

LECTURE NOTE #3 PROF. ALAN YUILLE

LECTURE NOTE #3 PROF. ALAN YUILLE LECTURE NOTE #3 PROF. ALAN YUILLE 1. Three Topics (1) Precision and Recall Curves. Receiver Operating Characteristic Curves (ROC). What to do if we do not fix the loss function? (2) The Curse of Dimensionality.

More information

Analysis Methods for Supersaturated Design: Some Comparisons

Analysis Methods for Supersaturated Design: Some Comparisons Journal of Data Science 1(2003), 249-260 Analysis Methods for Supersaturated Design: Some Comparisons Runze Li 1 and Dennis K. J. Lin 2 The Pennsylvania State University Abstract: Supersaturated designs

More information

STATISTICS OF OBSERVATIONS & SAMPLING THEORY. Parent Distributions

STATISTICS OF OBSERVATIONS & SAMPLING THEORY. Parent Distributions ASTR 511/O Connell Lec 6 1 STATISTICS OF OBSERVATIONS & SAMPLING THEORY References: Bevington Data Reduction & Error Analysis for the Physical Sciences LLM: Appendix B Warning: the introductory literature

More information

Machine Learning CSE546 Carlos Guestrin University of Washington. September 30, What about continuous variables?

Machine Learning CSE546 Carlos Guestrin University of Washington. September 30, What about continuous variables? Linear Regression Machine Learning CSE546 Carlos Guestrin University of Washington September 30, 2014 1 What about continuous variables? n Billionaire says: If I am measuring a continuous variable, what

More information

LTI Systems, Additive Noise, and Order Estimation

LTI Systems, Additive Noise, and Order Estimation LTI Systems, Additive oise, and Order Estimation Soosan Beheshti, Munther A. Dahleh Laboratory for Information and Decision Systems Department of Electrical Engineering and Computer Science Massachusetts

More information

CPSC 340: Machine Learning and Data Mining. Regularization Fall 2017

CPSC 340: Machine Learning and Data Mining. Regularization Fall 2017 CPSC 340: Machine Learning and Data Mining Regularization Fall 2017 Assignment 2 Admin 2 late days to hand in tonight, answers posted tomorrow morning. Extra office hours Thursday at 4pm (ICICS 246). Midterm

More information

CS 179: LECTURE 16 MODEL COMPLEXITY, REGULARIZATION, AND CONVOLUTIONAL NETS

CS 179: LECTURE 16 MODEL COMPLEXITY, REGULARIZATION, AND CONVOLUTIONAL NETS CS 179: LECTURE 16 MODEL COMPLEXITY, REGULARIZATION, AND CONVOLUTIONAL NETS LAST TIME Intro to cudnn Deep neural nets using cublas and cudnn TODAY Building a better model for image classification Overfitting

More information

Measures of Fit from AR(p)

Measures of Fit from AR(p) Measures of Fit from AR(p) Residual Sum of Squared Errors Residual Mean Squared Error Root MSE (Standard Error of Regression) R-squared R-bar-squared = = T t e t SSR 1 2 ˆ = = T t e t p T s 1 2 2 ˆ 1 1

More information

CSC321 Lecture 9: Generalization

CSC321 Lecture 9: Generalization CSC321 Lecture 9: Generalization Roger Grosse Roger Grosse CSC321 Lecture 9: Generalization 1 / 27 Overview We ve focused so far on how to optimize neural nets how to get them to make good predictions

More information

MLR Model Selection. Author: Nicholas G Reich, Jeff Goldsmith. This material is part of the statsteachr project

MLR Model Selection. Author: Nicholas G Reich, Jeff Goldsmith. This material is part of the statsteachr project MLR Model Selection Author: Nicholas G Reich, Jeff Goldsmith This material is part of the statsteachr project Made available under the Creative Commons Attribution-ShareAlike 3.0 Unported License: http://creativecommons.org/licenses/by-sa/3.0/deed.en

More information

Dimension Reduction Methods

Dimension Reduction Methods Dimension Reduction Methods And Bayesian Machine Learning Marek Petrik 2/28 Previously in Machine Learning How to choose the right features if we have (too) many options Methods: 1. Subset selection 2.

More information

A NEW INFORMATION THEORETIC APPROACH TO ORDER ESTIMATION PROBLEM. Massachusetts Institute of Technology, Cambridge, MA 02139, U.S.A.

A NEW INFORMATION THEORETIC APPROACH TO ORDER ESTIMATION PROBLEM. Massachusetts Institute of Technology, Cambridge, MA 02139, U.S.A. A EW IFORMATIO THEORETIC APPROACH TO ORDER ESTIMATIO PROBLEM Soosan Beheshti Munther A. Dahleh Massachusetts Institute of Technology, Cambridge, MA 0239, U.S.A. Abstract: We introduce a new method of model

More information

Variable Selection under Measurement Error: Comparing the Performance of Subset Selection and Shrinkage Methods

Variable Selection under Measurement Error: Comparing the Performance of Subset Selection and Shrinkage Methods Variable Selection under Measurement Error: Comparing the Performance of Subset Selection and Shrinkage Methods Ellen Sasahara Bachelor s Thesis Supervisor: Prof. Dr. Thomas Augustin Department of Statistics

More information

UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2014

UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2014 UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2014 Exam policy: This exam allows two one-page, two-sided cheat sheets (i.e. 4 sides); No other materials. Time: 2 hours. Be sure to write

More information

CSC321 Lecture 9: Generalization

CSC321 Lecture 9: Generalization CSC321 Lecture 9: Generalization Roger Grosse Roger Grosse CSC321 Lecture 9: Generalization 1 / 26 Overview We ve focused so far on how to optimize neural nets how to get them to make good predictions

More information

OPTIMIZATION METHODS IN DEEP LEARNING

OPTIMIZATION METHODS IN DEEP LEARNING Tutorial outline OPTIMIZATION METHODS IN DEEP LEARNING Based on Deep Learning, chapter 8 by Ian Goodfellow, Yoshua Bengio and Aaron Courville Presented By Nadav Bhonker Optimization vs Learning Surrogate

More information

Day 4: Shrinkage Estimators

Day 4: Shrinkage Estimators Day 4: Shrinkage Estimators Kenneth Benoit Data Mining and Statistical Learning March 9, 2015 n versus p (aka k) Classical regression framework: n > p. Without this inequality, the OLS coefficients have

More information

ECE521 week 3: 23/26 January 2017

ECE521 week 3: 23/26 January 2017 ECE521 week 3: 23/26 January 2017 Outline Probabilistic interpretation of linear regression - Maximum likelihood estimation (MLE) - Maximum a posteriori (MAP) estimation Bias-variance trade-off Linear

More information

Likelihood-Based Methods

Likelihood-Based Methods Likelihood-Based Methods Handbook of Spatial Statistics, Chapter 4 Susheela Singh September 22, 2016 OVERVIEW INTRODUCTION MAXIMUM LIKELIHOOD ESTIMATION (ML) RESTRICTED MAXIMUM LIKELIHOOD ESTIMATION (REML)

More information

Decay. Scalar Meson σ Phase Motion at D + π π + π + 1 Introduction. 2 Extracting f 0 (980) phase motion with the AD method.

Decay. Scalar Meson σ Phase Motion at D + π π + π + 1 Introduction. 2 Extracting f 0 (980) phase motion with the AD method. 1398 Brazilian Journal of Physics, vol. 34, no. 4A, December, 2004 Scalar Meson σ Phase Motion at D + π π + π + Decay Ignacio Bediaga Centro Brasileiro de Pesquisas Físicas-CBPF Rua Xavier Sigaud 150,

More information

arxiv: v1 [hep-ex] 31 Dec 2014

arxiv: v1 [hep-ex] 31 Dec 2014 The Journal s name will be set by the publisher DOI: will be set by the publisher c Owned by the authors, published by EDP Sciences, 5 arxiv:5.v [hep-ex] Dec 4 Highlights from Compass in hadron spectroscopy

More information

Discussion. Dean Foster. NYC

Discussion. Dean Foster. NYC Discussion Dean Foster Amazon @ NYC Differential privacy means in statistics language: Fit the world not the data. Differential privacy means in statistics language: Fit the world not the data. You shouldn

More information

Regression, Ridge Regression, Lasso

Regression, Ridge Regression, Lasso Regression, Ridge Regression, Lasso Fabio G. Cozman - fgcozman@usp.br October 2, 2018 A general definition Regression studies the relationship between a response variable Y and covariates X 1,..., X n.

More information

Linear Models: Comparing Variables. Stony Brook University CSE545, Fall 2017

Linear Models: Comparing Variables. Stony Brook University CSE545, Fall 2017 Linear Models: Comparing Variables Stony Brook University CSE545, Fall 2017 Statistical Preliminaries Random Variables Random Variables X: A mapping from Ω to ℝ that describes the question we care about

More information

Predicting AGI: What can we say when we know so little?

Predicting AGI: What can we say when we know so little? Predicting AGI: What can we say when we know so little? Fallenstein, Benja Mennen, Alex December 2, 2013 (Working Paper) 1 Time to taxi Our situation now looks fairly similar to our situation 20 years

More information

Model Selection. Frank Wood. December 10, 2009

Model Selection. Frank Wood. December 10, 2009 Model Selection Frank Wood December 10, 2009 Standard Linear Regression Recipe Identify the explanatory variables Decide the functional forms in which the explanatory variables can enter the model Decide

More information

Statistics for Data Analysis. Niklaus Berger. PSI Practical Course Physics Institute, University of Heidelberg

Statistics for Data Analysis. Niklaus Berger. PSI Practical Course Physics Institute, University of Heidelberg Statistics for Data Analysis PSI Practical Course 2014 Niklaus Berger Physics Institute, University of Heidelberg Overview You are going to perform a data analysis: Compare measured distributions to theoretical

More information

Primer on statistics:

Primer on statistics: Primer on statistics: MLE, Confidence Intervals, and Hypothesis Testing ryan.reece@gmail.com http://rreece.github.io/ Insight Data Science - AI Fellows Workshop Feb 16, 018 Outline 1. Maximum likelihood

More information

MACHINE LEARNING ADVANCED MACHINE LEARNING

MACHINE LEARNING ADVANCED MACHINE LEARNING MACHINE LEARNING ADVANCED MACHINE LEARNING Recap of Important Notions on Estimation of Probability Density Functions 2 2 MACHINE LEARNING Overview Definition pdf Definition joint, condition, marginal,

More information

COS513: FOUNDATIONS OF PROBABILISTIC MODELS LECTURE 10

COS513: FOUNDATIONS OF PROBABILISTIC MODELS LECTURE 10 COS53: FOUNDATIONS OF PROBABILISTIC MODELS LECTURE 0 MELISSA CARROLL, LINJIE LUO. BIAS-VARIANCE TRADE-OFF (CONTINUED FROM LAST LECTURE) If V = (X n, Y n )} are observed data, the linear regression problem

More information

Model Selection in GLMs. (should be able to implement frequentist GLM analyses!) Today: standard frequentist methods for model selection

Model Selection in GLMs. (should be able to implement frequentist GLM analyses!) Today: standard frequentist methods for model selection Model Selection in GLMs Last class: estimability/identifiability, analysis of deviance, standard errors & confidence intervals (should be able to implement frequentist GLM analyses!) Today: standard frequentist

More information

Biostatistics-Lecture 16 Model Selection. Ruibin Xi Peking University School of Mathematical Sciences

Biostatistics-Lecture 16 Model Selection. Ruibin Xi Peking University School of Mathematical Sciences Biostatistics-Lecture 16 Model Selection Ruibin Xi Peking University School of Mathematical Sciences Motivating example1 Interested in factors related to the life expectancy (50 US states,1969-71 ) Per

More information

LOGISTIC REGRESSION Joseph M. Hilbe

LOGISTIC REGRESSION Joseph M. Hilbe LOGISTIC REGRESSION Joseph M. Hilbe Arizona State University Logistic regression is the most common method used to model binary response data. When the response is binary, it typically takes the form of

More information

MS-C1620 Statistical inference

MS-C1620 Statistical inference MS-C1620 Statistical inference 10 Linear regression III Joni Virta Department of Mathematics and Systems Analysis School of Science Aalto University Academic year 2018 2019 Period III - IV 1 / 32 Contents

More information

Part 1. Multiple Choice (50 questions, 1 point each) Part 2. Problems/Short Answer (10 questions, 5 points each)

Part 1. Multiple Choice (50 questions, 1 point each) Part 2. Problems/Short Answer (10 questions, 5 points each) GROUND RULES: This exam contains two parts: Part 1. Multiple Choice (50 questions, 1 point each) Part 2. Problems/Short Answer (10 questions, 5 points each) The maximum number of points on this exam is

More information

Shrinkage Methods: Ridge and Lasso

Shrinkage Methods: Ridge and Lasso Shrinkage Methods: Ridge and Lasso Jonathan Hersh 1 Chapman University, Argyros School of Business hersh@chapman.edu February 27, 2019 J.Hersh (Chapman) Ridge & Lasso February 27, 2019 1 / 43 1 Intro and

More information

Modern Methods of Data Analysis - WS 07/08

Modern Methods of Data Analysis - WS 07/08 Modern Methods of Data Analysis Lecture VIc (19.11.07) Contents: Maximum Likelihood Fit Maximum Likelihood (I) Assume N measurements of a random variable Assume them to be independent and distributed according

More information

Statistical Methods in Particle Physics. Lecture 2

Statistical Methods in Particle Physics. Lecture 2 Statistical Methods in Particle Physics Lecture 2 October 17, 2011 Silvia Masciocchi, GSI Darmstadt s.masciocchi@gsi.de Winter Semester 2011 / 12 Outline Probability Definition and interpretation Kolmogorov's

More information

MSA220/MVE440 Statistical Learning for Big Data

MSA220/MVE440 Statistical Learning for Big Data MSA220/MVE440 Statistical Learning for Big Data Lecture 9-10 - High-dimensional regression Rebecka Jörnsten Mathematical Sciences University of Gothenburg and Chalmers University of Technology Recap from

More information

Chapter 1 Statistical Inference

Chapter 1 Statistical Inference Chapter 1 Statistical Inference causal inference To infer causality, you need a randomized experiment (or a huge observational study and lots of outside information). inference to populations Generalizations

More information

ECE521 lecture 4: 19 January Optimization, MLE, regularization

ECE521 lecture 4: 19 January Optimization, MLE, regularization ECE521 lecture 4: 19 January 2017 Optimization, MLE, regularization First four lectures Lectures 1 and 2: Intro to ML Probability review Types of loss functions and algorithms Lecture 3: KNN Convexity

More information

F & B Approaches to a simple model

F & B Approaches to a simple model A6523 Signal Modeling, Statistical Inference and Data Mining in Astrophysics Spring 215 http://www.astro.cornell.edu/~cordes/a6523 Lecture 11 Applications: Model comparison Challenges in large-scale surveys

More information

Week 8 Hour 1: More on polynomial fits. The AIC

Week 8 Hour 1: More on polynomial fits. The AIC Week 8 Hour 1: More on polynomial fits. The AIC Hour 2: Dummy Variables Hour 3: Interactions Stat 302 Notes. Week 8, Hour 3, Page 1 / 36 Interactions. So far we have extended simple regression in the following

More information

CPSC 340: Machine Learning and Data Mining. Sparse Matrix Factorization Fall 2018

CPSC 340: Machine Learning and Data Mining. Sparse Matrix Factorization Fall 2018 CPSC 340: Machine Learning and Data Mining Sparse Matrix Factorization Fall 2018 Last Time: PCA with Orthogonal/Sequential Basis When k = 1, PCA has a scaling problem. When k > 1, have scaling, rotation,

More information

MS&E 226. In-Class Midterm Examination Solutions Small Data October 20, 2015

MS&E 226. In-Class Midterm Examination Solutions Small Data October 20, 2015 MS&E 226 In-Class Midterm Examination Solutions Small Data October 20, 2015 PROBLEM 1. Alice uses ordinary least squares to fit a linear regression model on a dataset containing outcome data Y and covariates

More information

NATCOR. Forecast Evaluation. Forecasting with ARIMA models. Nikolaos Kourentzes

NATCOR. Forecast Evaluation. Forecasting with ARIMA models. Nikolaos Kourentzes NATCOR Forecast Evaluation Forecasting with ARIMA models Nikolaos Kourentzes n.kourentzes@lancaster.ac.uk O u t l i n e 1. Bias measures 2. Accuracy measures 3. Evaluation schemes 4. Prediction intervals

More information

Linear Regression. CSL603 - Fall 2017 Narayanan C Krishnan

Linear Regression. CSL603 - Fall 2017 Narayanan C Krishnan Linear Regression CSL603 - Fall 2017 Narayanan C Krishnan ckn@iitrpr.ac.in Outline Univariate regression Multivariate regression Probabilistic view of regression Loss functions Bias-Variance analysis Regularization

More information

Course Introduction and Overview Descriptive Statistics Conceptualizations of Variance Review of the General Linear Model

Course Introduction and Overview Descriptive Statistics Conceptualizations of Variance Review of the General Linear Model Course Introduction and Overview Descriptive Statistics Conceptualizations of Variance Review of the General Linear Model EPSY 905: Multivariate Analysis Lecture 1 20 January 2016 EPSY 905: Lecture 1 -

More information