Algebra of Random Variables: Optimal Average and Optimal Scaling Minimising

Size: px

Start display at page:

Download "Algebra of Random Variables: Optimal Average and Optimal Scaling Minimising"

Brianna Booth
5 years ago
Views:

1 Review: Optimal Average/Scaling is equivalent to Minimise χ Two 1-parameter models: Estimating < > : Scaling a pattern: Two equivalent methods: Algebra of Random Variables: Optimal Average and Optimal Scaling ˆ = i i /σ i 1/σ i σ ( ˆ ) = i 1 1/σ i µ i = µ µ i = A P i ˆ A = i i P i /σ i P i /σ i σ ˆ A ( ) = i 1 P i /σ i Minimising χ gives same result: χ Δχ χ χ min σ ( ˆα ) = = α ˆα σ ( ˆα ) χ α α = ˆα +... Δχ = 1 χ min α ˆ ± σ( α ˆ ) α

2 Chi-squared = Badness of Fit χ i µ i (α) ~ χ M σ i = data values i =1... σ i =1- σ error bar µ i (α) = model predicted data value α k = parameters of the model k =1... M = number of data points M = number of fitted parameters M = degrees of freedom

3 χ The Dancing χ Landscape Fit M parameters to data points. χ (,σ,α ) α ˆ ± σ( α ˆ ) µ i ( α) σ i Best - fit parameters ˆ α minimise χ. χ χ min Δχ Δχ = 1 Δα χ min α true ˆ α α σ ( ˆα ) = χ α α= ˆα α ˆα χ min Δχ χ ~ G( α true,σ ( ˆα) ) χ ( α true ) ~ χ χ ( ˆα ) ~ χ M ( α true ) χ min ~ χ M

4 Constructing χ from Gaussians Sum of squares of independent Gaussian random variables χ Chi - squared with degrees of freedom and Y are independent Gaussian random variables. ~ G(0,1) Y ~ G(0,1) ~ χ 1 Y ~ χ 1 Y + Y ~ χ and so on for each new degree of freedom: χ + χ M ~ χ +M

5 Review: χ distribution degrees of freedom f (x) = 1 Γ( /) / x( / 1) e x / Γ(1) =1 Γ(1/ ) = π Γ(n) = (n 1)! Γ(x +1) = x Γ(x) e.g. Γ(3 / ) = (1/ ) Γ(1/ ) = π / χ e x 1 : f (x) = π x χ : f (x) = 1 e x / 1/ χ = σ ( χ ) =

6 Data points with no error bars L data points: = Cov( j ) = σ δ ij Sample mean: 1 unbiased: =. Var i But σ i are unknown. How can we estimate σ? Variance: σ Try: ( ) ( ) s 1 Is s = σ? i ( ) o. s < σ We can correct for this bias. ( ) = σ

7 Sample Variance S : Unbiased for σ S A ( ) Pick A so that S = A ( ) = σ ( ) = [( ) ( )] = ( ) ( ) ( ) + ( ) = σ ( ) Cov(, ) + σ ( ) = σ σ + σ = 1 1 σ = 1 σ S = A ( 1) σ Pick A = 1 1 ote : Cov(, ) = σ S 1 1 ( )

8 Evaluation of Cov(, ) Cov(, ) ( ) ( ) ote : = = Shift coords to put = 0 : Cov(, ) = ( 0) ( 0) = 1 = 1 k k k k = 1 σ δ ik = σ k Slope = 1/ Cov(, j ) σ δ i j

9 Sample Variance S : Unbiased for σ S 1 ( ) 1 1 Why -1, not 1? Because "chases" the dancing data points, removing 1 "degree-of-freedom" from the dance. S ~ σ 1 χ 1 S = σ 1 χ 1 = σ ( 1 1 ) = σ Var S 1 [ ] = σ σ( S ) = S = σ 1 1 1/ Var[ χ 1 ] ( 1) = σ 4 1 = fractional accuracy

10 Degrees of Freedom (DoF) data points: = Cov(, j ) = σ i δ ij ~ χ. degrees of freedom. σ i If unknown, use ˆnstead: ˆ σ i If =1 data point: ˆ= 1 1 σ 1 1 ˆ σ 1 ~ χ 1. 1 degrees of freedom. ~ χ 1. 1 degree of freedom Fit M parameters to data points: ( ) = 0. 0 degrees of freedom. µ i α ~ χ M. M degrees of freedom. σ i ˆ Each fitted parameter removes 1 degree of freedom from the residuals: ˆ

11 ( S ) 1/ is biased for σ The sample variance S is unbiased for σ. Is ( S ) 1/ unbiased for σ? o. The square root introduces a bias: (S ) 1/ σ σ S S < σ, even though σ = σ.

12 Example: Correct the Bias in (S ) 1/ Define y(x) = x b, Derivatives: y'(x) = b x b 1, Evaluate the bias: ( S ) b = y S = y σ ( ) + y" ( S ) ( ) + y" ( σ ) = σ b + " ( S ) 1/ = σ $ 1 # y"(x) = b(b 1) x b Var ( S ) +... σ (b ) b(b 1)σ σ = σ " b b(b 1) $ 1+ # % ' & 1 4( 1) +... % " ' = σ 4 5 % $ '+... & # 4 4 & Bias - corrected : S 4 4 S 4 5 ( ) 1/

13 Robust estimation methods Robust => less sensitive to bad data. Example: using median rather than mean: Sample Mean minimizes the Sample Variance: S 1 1 ( µ) = 0 µ for µ = ( µ ) µ MAD 1 µ = 0 µ mean median M Median M minimizes the Mean Absolute Deviation : ( ) for µ = M Median

14 Mean vs Median The median is less sensitive to outliers than the mean. Mean Median The median is unbiased, but not a minimum-variance estimator. ote how the standard deviations of the median and of the mean vary with sample size. Median Mean

15 Proof that the Median minimises the MAD H(x) MAD 1 µ = 1 ( µ ) H ( µ ) d MAD d µ = %& H ( µ ) + ( µ ) H$ ( µ ) ' ( = H µ +1, x > 0 0, x = 0 1, x < 0 ( ) = ( > µ) ( < µ) = 0 if µ = median( ) d H d x = δ(x) +1 = 0 1 since H (x) = 0 whenever x 0

16 Find the Median without Sorting M M A useful algorithm ( Is it faster than sorting? ) : Since i M = 0, first make a guess at M. M Then estimate a new M = and iterate to convergence. M 1 M,

17 Median Filter and Sigma-Clip Median filter: window of points centred at time t medfilt( t ) is the median of the points. Sigma-clip: Window Fit all points by minimising χ Set threshold K and check for outliers at ± K σ or more Repeat fit omitting largest outlier Iterate until set of rejected points converges. Reject Reject

18 Various Badness-of-Fit Statistics Sample Variance mean S 1 ( µ i ) 1 Chi-squared optimal average χ i µ i ˆ σ i Mean Absolute Deviation MAD 1 median µ i M Sum Absolute ormalised Errors: SAE µ i σ i ε η ε η Badness functions: Sigma-clip ± K σ

19 S = Sample Variance MAD = Mean Absolute Deviation Badness of Fit: S ( µ ) ( 1)S = ε i ε i µ i 3 good points 1 bad Badness of Fit: MAD( µ ) Median MAD 3 good points 1 bad with ε i without with without µ µ

20 χ = Sum of Squared ormalised Errors SAE = Sum Absolute ormalised Errors Badness of Fit: χ ( µ ) χ η i η i µ i σ i 3 good points 1 bad ˆ Badness of Fit: SAE( µ ) SAE 3 good points 1 bad with without η i µ µ

21 χ = Sum of Squared ormalised Errors A clipped Badness of Fit Statistic Badness of Fit: χ ( µ ) 3 good points 1 bad ˆ χ η i η i µ i σ i Badness of Fit: BoF( µ ) BoF 1 exp η i with without { } ote local minimum µ µ

Algebra of Random Variables: Optimal Average and Optimal Scaling Minimising

Review: Optimal Average/Scaling is equivalent to Minimise χ Two 1-parameter models: Estimating < > : Scaling a pattern: Two equivalent methods: Algebra of Random Variables: Optimal Average and Optimal