Algebra of Random Variables: Optimal Average and Optimal Scaling Minimising

Size: px

Start display at page:

Download "Algebra of Random Variables: Optimal Average and Optimal Scaling Minimising"

Lindsay Fowler
5 years ago
Views:

1 Review: Optimal Average/Scaling is equivalent to Minimise χ Two 1-parameter models: Estimating < > : Scaling a pattern: Two equivalent methods: Algebra of Random Variables: Optimal Average and Optimal Scaling ˆ = i i /σ i 1/σ i σ ( ˆ ) = i 1 1/σ i µ i = µ µ i = A P i ˆ A = i i P i /σ i P i /σ i σ ˆ A ( ) = i 1 P i /σ i Minimising χ gives same result: χ Δχ χ χ min σ ( ˆα ) = = α ˆα σ ( ˆα ) χ α α = ˆα +... Δχ = 1 χ min α ˆ ± σ( α ˆ ) α

2 Chi-squared = Badness of Fit χ i µ i (α) ~ χ M σ i = data values i =1... σ i =1- σ error bar µ i (α) = model predicted data value α k = parameters of the model k =1... M = number of data points M = number of fitted parameters M = degrees of freedom

3 χ Dancing Data => Dancing χ Landscape Fit M parameters to data points. χ (,σ,α ) α ˆ ± σ( α ˆ ) µ i ( α) σ i Best - fit parameters ˆ α minimise χ. χ χ min Δχ χ min σ ( ˆα ) = Δχ = 1 χ α α= ˆα Caveat: Assumes orthogonal parameters. Generalise to correlated parameters later. α ˆα χ min Δχ χ α true Δα ˆ α ~ G( α true,σ ( ˆα) ) χ ( α true ) ~ χ χ ( ˆα ) ~ χ M ( α true ) χ min α ~ χ M

4 Constructing χ from Gaussians Sum of squares of independent Gaussian random variables χ Chi - squared with degrees of freedom and Y are independent Gaussian random variables. ~ G(0,1) Y ~ G(0,1) ~ χ 1 Y ~ χ 1 Y + Y ~ χ and so on for each new degree of freedom: χ + χ M ~ χ +M

5 Review: χ distribution degrees of freedom f (x) = 1 Γ( /) / x( / 1) e x / Γ(1) =1 Γ(1/ ) = π Γ(n) = (n 1)! Γ(x +1) = x Γ(x) e.g. Γ(3 / ) = (1/ ) Γ(1/ ) = π / χ e x 1 : f (x) = π x χ : f (x) = 1 e x / 1/ χ = σ ( χ ) =

6 Data points with no error bars L data points: = Cov( j ) = σ δ ij Sample mean: 1 unbiased: =. Var i But σ i are unknown. How can we estimate σ? Variance: σ Try: ( ) ( ) s 1 Is s = σ? i ( ) o. s < σ We can correct for this bias. ( ) = σ

7 Sample Variance S : Unbiased for σ S A ( ) Pick A so that S = A ( ) = σ ( ) = [( ) ( )] = ( ) ( ) ( ) + ( ) = σ ( ) Cov(, ) + σ ( ) = σ σ + σ = 1 1 σ = 1 σ S = A ( 1) σ Pick A = 1 1 ote : Cov(, ) = σ S 1 1 ( )

8 Evaluation of Cov(, ) Cov(, ) ( ) ( ) ote : = = Shift coords to put = 0 : Cov(, ) = ( 0) ( 0) = 1 = 1 k k k k = 1 σ δ ik = σ k Slope = 1/ Cov(, j ) σ δ i j

9 Sample Variance S : Unbiased for σ S 1 ( ) 1 1 Why -1, not 1? Because "chases" the dancing data points, removing 1 "degree-of-freedom" from the dance. S ~ σ 1 χ 1 S = σ 1 χ 1 = σ ( 1 1 ) = σ Var S 1 [ ] = σ σ( S ) = S = σ 1 1 1/ Var[ χ 1 ] ( 1) = σ 4 1 = fractional accuracy

10 Degrees of Freedom (DoF) data points: = Cov(, j ) = σ i δ ij ~ χ. degrees of freedom. σ i If unknown, use ˆnstead: ˆ σ i If =1 data point: ˆ= 1 1 σ 1 1 ˆ σ 1 ~ χ 1. 1 degrees of freedom. ~ χ 1. 1 degree of freedom Fit M parameters to data points: ( ) = 0. 0 degrees of freedom. µ i α ~ χ M. M degrees of freedom. σ i ˆ Each fitted parameter removes 1 degree of freedom from the residuals: ˆ

11 Is ( S ) 1/ unbiased for σ? The sample variance S is unbiased for σ. i.e. < S > = σ Is ( S ) 1/ unbiased for σ? o. The square root introduces a bias: (S ) 1/ σ Homework: Work out the bias correction factor as a function of. σ S S < σ, even though σ = σ.

12 Robust estimation methods Robust => less sensitive to bad data. Example: using median rather than mean: Sample Mean minimizes the Sample Variance: S 1 1 ( µ) = 0 µ for µ = ( µ ) µ MAD 1 µ = 0 µ mean median M Median M minimizes the Mean Absolute Deviation : ( ) for µ = M Median

13 Mean vs Median The median is less sensitive to outliers than the mean. Mean Median The median is unbiased, but not a minimum-variance estimator. ote how the standard deviations of the median and of the mean vary with sample size. Median Mean

14 Proof that the Median minimises the MAD H(x) MAD 1 µ = 1 ( µ ) H ( µ ) d MAD d µ = %& H ( µ ) + ( µ ) H$ ( µ ) ' ( = H µ +1, x > 0 0, x = 0 1, x < 0 ( ) = ( > µ) ( < µ) = 0 if µ = median( ) d H d x = δ(x) +1 = 0 1 since H (x) = 0 whenever x 0

15 Find the Median without Sorting M M Since M = 0, first make a guess at M. M Then estimate a new M = and iterate to convergence. M 1 M, A useful algorithm ( Is it faster than sorting? ) :

16 Median Filter and Sigma-Clip Median filter: window of points centred at time t medfilt( t ) is the median of the points. Sigma-clip: Window Fit all points by minimising χ Set threshold K and check for outliers at ± K σ or more Repeat fit omitting largest outlier Iterate until set of rejected points converges. Reject Reject

17 Various Badness-of-Fit Statistics Sample Variance mean S 1 ( µ i ) 1 Chi-squared optimal average χ i µ i ˆ σ i Mean Absolute Deviation MAD 1 median µ i M Sum Absolute ormalised Errors: SAE µ i σ i ε χ ε χ Badness functions: Sigma-clip ± K σ

18 S = Sample Variance MAD = Mean Absolute Deviation Badness of Fit: S ( µ ) ( 1)S = ε i ε i µ i 3 good points 1 bad Badness of Fit: MAD( µ ) Median MAD 3 good points 1 bad with ε i without with without µ µ

19 χ = Sum of Squared ormalised Errors SAE = Sum Absolute ormalised Errors Badness of Fit: χ ( µ ) χ χ i χ i µ i σ i 3 good points 1 bad ˆ Badness of Fit: SAE( µ ) SAE 3 good points 1 bad with without χ i µ µ

20 χ = Sum of Squared ormalised Errors A clipped Badness of Fit Statistic Badness of Fit: χ ( µ ) 3 good points 1 bad ˆ χ χ i χ i µ i σ i Badness of Fit: BoF( µ ) BoF # $ 1 exp χ i with without { } ote local minimum % & µ µ

Algebra of Random Variables: Optimal Average and Optimal Scaling Minimising

Review: Optimal Average/Scaling is equivalent to Minimise χ Two 1-parameter models: Estimating < > : Scaling a pattern: Two equivalent methods: Algebra of Random Variables: Optimal Average and Optimal