Least squares problems

Size: px
Start display at page:

Download "Least squares problems"

Transcription

1 Least squares problems How to state and solve them, then evaluate their solutions Stéphane Mottelet Université de Technologie de Compiègne 30 septembre 2016 Stéphane Mottelet (UTC) Least squares 1 / 55

2 Outline 1 Motivation and statistical framework 2 Maths reminder (survival kit) 3 Linear Least Squares (LLS) 4 Non Linear Least Squares (NLLS) 5 Statistical evaluation of solutions 6 Model selection Stéphane Mottelet (UTC) Least squares 2 / 55

3 Motivation and statistical framework 1 Motivation and statistical framework 2 Maths reminder (survival kit) 3 Linear Least Squares (LLS) 4 Non Linear Least Squares (NLLS) 5 Statistical evaluation of solutions 6 Model selection Stéphane Mottelet (UTC) Least squares 3 / 55

4 Motivation Regression problem Data : (x i, y i ) i=1..n, Model : y = f (x; θ) x R : independent variable y R : dependent variable (value found by observation) θ R p : parameters Regression problem Find θ such that the model best explains the data, i.e. y i close to f (x i, θ), i = 1... n. Stéphane Mottelet (UTC) Least squares 4 / 55

5 Motivation Regression problem, example Simple linear regression : (x i, y i ) R 2 y find θ 1, θ 2 such that the data fits the model y = θ 1 + θ 2 x How does one measure the fit/misfit? Stéphane Mottelet (UTC) Least squares 5 / 55

6 Motivation Least squares method The least squares method measures the fit with the function S(θ) = n (y i f (x i ; θ)) 2, i=1 and aims to find ˆθ such that θ R p, S(ˆθ) S(θ), or equivalently ˆθ = arg min S(θ). θr p Important issues statistical interpretation existence, uniqueness and practical determination of ˆθ (algorithms) Stéphane Mottelet (UTC) Least squares 6 / 55

7 Statistical framework Hypothesis 1 (x i ) i=1...n are given 2 (y i ) i=1...n are samples of statistically independent random variables y i = f (x i ; θ) + ε i, where ε i is a random variable with E[ε i ] = 0, E[ε 2 i ] = σ2, density ε g(ε) The probability density of y i is given by φ i : R R p (y, θ) φ i (y; θ) = g(y f (x i ; θ)) and E[y i θ] = f (x i ; θ) Stéphane Mottelet (UTC) Least squares 7 / 55

8 Statistical framework Example If ε is normally distributed, i.e. g(ε) = (σ 2π) 1 exp( 1 ε 2 ) 2σ 2 φ i (y; θ) = (σ ( 2π) 1 exp 1 ) 2σ 2 (y f (x i; θ)) 2 Stéphane Mottelet (UTC) Least squares 8 / 55

9 Statistical framework Joint probability density and Likelihood function Joint density When θ is given, as the (y i ) are independent, the density of the whole vector y = (y 1,..., y n ) is Interpretation : for D R n Likelihood function φ(y; θ) = Prob[y D θ] = n φ i (y i ; θ). i=1 D φ(y; θ) dy 1... dy n When a sample of y is given, then L(θ; y) = φ(y; θ) is called Likelihood of the parameters θ Stéphane Mottelet (UTC) Least squares 9 / 55

10 Statistical framework Maximum Likelihood Estimation The Maximum Likelihood Estimate of θ is the vector ˆθ defined by Under the Gaussian hypothesis, then L(θ; y) = i=1 ˆθ = arg max L(θ; y). θ Rp n (σ ( 2π) 1 exp 1 ) 2σ 2 (y f (x i; θ)) 2, = (σ 2π) n exp ( 1 2σ 2 ) n (y f (x i ; θ)) 2, i=1 hence, we recover the least squares solution, i.e. arg max L(θ; y) = arg min S(θ). θ Rp θ R p Stéphane Mottelet (UTC) Least squares 10 / 55

11 Statistical framework Alternatives : Least Absolute Deviation Regression Least Absolute Deviation Regression : the misfit is measured by S 1 (θ) = n y i f (x i ; θ). i=1 Is ˆθ = arg min θ R p S 1 (θ) is a maximum likelihood estimate? Yes, if ε i has a Laplace distribution g(ε) = (σ 2) 1 exp First issue : S 1 is not differentiable ( ) 2 σ ε Stéphane Mottelet (UTC) Least squares 11 / 55

12 Statistical framework Alternatives : Least Absolute Deviation Regression Densities of Gaussian vs. Laplacian random variables (with zero mean and unit variance) : Second issue : the two statistical hypothesis are very different! Stéphane Mottelet (UTC) Least squares 12 / 55

13 Statistical framework Take home message Take home message #1 : Doing Least Squares Regression means that you assume that the model error is Gaussian. However, if you have no idea about the model error : 1 the nice theoretical and computational framework you will get is worth doing this assumption... 2 a posteriori goodness of fit tests can be used to assess the normality of errors. Stéphane Mottelet (UTC) Least squares 13 / 55

14 Maths reminder 1 Motivation and statistical framework 2 Maths reminder 3 Linear Least Squares (LLS) 4 Non Linear Least Squares (NLLS) 5 Statistical evaluation of solutions 6 Model selection Stéphane Mottelet (UTC) Least squares 14 / 55

15 Maths reminder Matrix algebra Notation : A M n,m (R), x R n, a a 1m A =..... a n1... a nm, x = Product : for B M m,p (R), C = AB M n,p (R), c ij = m a ik b kj k=1 x 1. x n Identity matrix I = Stéphane Mottelet (UTC) Least squares 15 / 55

16 Maths reminder Matrix algebra Transposition, Inner product and norm : A M m,n (R), [ A ] ij = a ji For x R n, y R n, x, y = x y = n x i y i, i=1 x 2 = x x Stéphane Mottelet (UTC) Least squares 16 / 55

17 Maths reminder Matrix algebra Linear dependance / independence : a set {x 1,..., x m } of vectors in R n is dependent if a vector x j can be written as m x j = α k x k k=1,k i a set of vectors which is not dependent is called independent a set of m > n vectors is necessarily dependent a set of n independent vectors in R n is called a basis The rank of a A M nm is the number of its linearly independent columns rank(a) = m {Ax = 0 x = 0} Stéphane Mottelet (UTC) Least squares 17 / 55

18 Maths reminder Linear system of equations When A is square rank(a) = n there exists A 1 s.t. A 1 A = AA 1 = I When the above property holds : For all y R n, the system of equations has a unique solution x = A 1 y. Ax = y, Computation : Gauss elimination algorithm (no computation of A 1 ) in Scilab/Matlab : x = A\y Stéphane Mottelet (UTC) Least squares 18 / 55

19 Maths reminder Differentiability Definition : let f : R n R m, f 1 (x) f (x) = f is differentiable at a R n if. f m (x), f i : R n R, f (a + h) = f (a) + J f (a)h + h ε(h), lim ε(h) = 0 h 0 Jacobian matrix, partial derivatives : [J f (a)] ij = f i x j (a) Gradient : if f : R n R, is differentiable at a, f (a + h) = f (a) + f (a) h + h ε(h), lim ε(h) = 0 h 0 Stéphane Mottelet (UTC) Least squares 19 / 55

20 Maths reminder Nonlinear system of equations When f : R n R n, a solution ˆx to the system of equations f (ˆx) = 0 can be found (or not) by the Newton s method : given x 0, for each k 1 consider the affine approximation of f at x k 2 take x k+1 such that T (x k+1 ) = 0, T (x) = f (x k ) + J f (x k )(x x k ) x k+1 = x k J f (x k ) 1 f (x k ) Newton s method can be very fast... if x 0 is not too far from ˆx! Stéphane Mottelet (UTC) Least squares 20 / 55

21 Maths reminder Find a local minimum - gradient algorithm When f : R n R is differentiable, a vector ˆx satisfying f (ˆx) = 0 and x R n, f (ˆx) f (x) can be found by the descent algorithm : given x 0, for each k : 1 select a direction d k such that f (x k ) d k < 0 2 select a step ρ k, such that x k+1 = x k + ρ k d k, satisfies (among other conditions) f (x k+1 ) < f (x k ) The choice d k = f (x k ) leads to the gradient algorithm Stéphane Mottelet (UTC) Least squares 21 / 55

22 Maths reminder Find a local minimum - gradient algorithm * x 3 x 4 x 1 x 2 x 0 * x k+1 = x k ρ k f (x k ), Stéphane Mottelet (UTC) Least squares 22 / 55

23 Linear Least Squares (LLS) 1 Motivation and statistical framework 2 Maths reminder 3 Linear Least Squares (LLS) 4 Non Linear Least Squares (NLLS) 5 Statistical evaluation of solutions Stéphane Mottelet (UTC) Least squares 23 / 55

24 Linear Least Squares Linear models The model y = f (x; θ) is linear w.r.t. θ, i.e. p y = θ k φ k (x), k=1 φ k : R R Examples y = p k=1 θ kx k 1 y = p k=1 θ k cos (k 1)x, where T = x T n x 1... Stéphane Mottelet (UTC) Least squares 24 / 55

25 Linear Least Squares Simple linear regression Simple linear regression S(θ) = n (θ 1 + θ 2 x i y i ) 2 = r(θ) 2, i=1 Residual vector r(θ) For the whole residual vector r(θ) = Aθ y, y = [ ] θ1 r i (θ) = [1, x i ] θ 2 y 1. y n, A = 1 x x n Stéphane Mottelet (UTC) Least squares 25 / 55

26 Linear Least Squares Optimality conditions Linear Least Squares problem : find ˆθ Necessary optimality condition ˆθ = arg min θr p S(ˆθ) = Aθ y 2 S(ˆθ) = 0 Compute the gradient by expanding S(θ) Stéphane Mottelet (UTC) Least squares 26 / 55

27 Linear Least Squares Optimality conditions S(θ + h) = A(θ + h) y 2 = Aθ y + Ah 2 = (Aθ y + Ah) (Aθ y + Ah) = (Aθ y) (Aθ y) + (Aθ y) Ah + (Ah) (Aθ y) + (Ah) Ah = Aθ y 2 + 2(Aθ y) Ah + Ah 2 = S(θ) + S(θ) h + Ah 2 S(θ) = 2A (Aθ y) Stéphane Mottelet (UTC) Least squares 27 / 55

28 Linear Least Squares Optimality conditions Theorem : a solution of the LLS problem is given by ˆθ, solution of the normal equations A A ˆθ = A y, moreover, if rank A = p then ˆθ is unique Proof : S(θ) = S(ˆθ + θ ˆθ) = S(ˆθ) + S(ˆθ) (θ ˆθ) + A(θ ˆθ) 2, Uniqueness : = S(ˆθ) + A(θ ˆθ) 2, S(ˆθ) S(ˆθ) = S(θ) A(θ ˆθ) 2 = 0, A(θ ˆθ) = 0 θ = ˆθ, Stéphane Mottelet (UTC) Least squares 28 / 55

29 Linear Least Squares Simple linear regression rank A = 2 if there exists i j such that x i x j Computations : S x = n x i, S y = i=1 n y i, S xy = i=1 [ A n Sx A = S x S xx n x i y i, S xx = i=1 ], A y = [ Sy S xy ] n i=1 x 2 i θ 1 = S ys xx S x S xy ns xx Sx 2, θ 2 = ns xy S x S y ns xx Sx 2 Stéphane Mottelet (UTC) Least squares 29 / 55

30 Linear Least Squares Practical resolution, Scilab or Matlab A=[ones(x),x,x.ˆ2] theta=a\y plot(x,y,"o",x,a*theta,x,"r") Stéphane Mottelet (UTC) Least squares 30 / 55

31 Linear Least Squares Practical resolution, Scilab or Matlab A=[ones(x),x,x.ˆ2] theta=a\y plot(x,y,"o",x,a*theta,x,"r") Stéphane Mottelet (UTC) Least squares 30 / 55

32 Linear Least Squares An interesting example Find a circle wich best fits (x i, y i ) i=1...n in the plane Algebraic distance d(a, b, R) = n ( (xi a) 2 + (y i b) 2 R 2) 2 = r 2 i=1 The residual is non-linear w.r.t. (a, b, R) but r i = R 2 a 2 b 2 + 2ax i + 2by i (xi 2 + yi 2 ), a = [2x i, 2y i, 1] b (x 2 R 2 a 2 b 2 i + yi 2 ) is linear w.r.t. (a, b, R 2 a 2 b 2 ) = θ. Stéphane Mottelet (UTC) Least squares 31 / 55

33 Linear Least Squares An interesting example Standard form 2x 1 2y 1 1 A =... 2x n 2y n 1, z = x y 2 1. x 2 n + y 2 n, d(a, b, R) = Aθ z 2 In Scilab A=[2*x,2*y,ones(x)] z=x.ˆ2+y.ˆ2 theta=a\z a=theta(1) b=theta(2) R=sqrt(theta(3)+aˆ2+bˆ2) t=linspace(0,2*%pi,100) plot(x,y,"o",a+r*cos(t),b+r*sin(t)) Stéphane Mottelet (UTC) Least squares 32 / 55

34 Linear Least Squares An interesting example Standard form 2x 1 2y 1 1 A =... 2x n 2y n 1, z = x y 2 1. x 2 n + y 2 n, d(a, b, R) = Aθ z 2 In Scilab A=[2*x,2*y,ones(x)] z=x.ˆ2+y.ˆ2 theta=a\z a=theta(1) b=theta(2) R=sqrt(theta(3)+aˆ2+bˆ2) t=linspace(0,2*%pi,100) plot(x,y,"o",a+r*cos(t),b+r*sin(t)) Stéphane Mottelet (UTC) Least squares 32 / 55

35 Linear Least Squares Take home message Take home message #2 : Solving linear least squares problem is just a matter of linear algebra Stéphane Mottelet (UTC) Least squares 33 / 55

36 Non Linear Least Squares (NLLS) 1 Motivation and statistical framework 2 Maths reminder 3 Linear Least Squares (LLS) 4 Non Linear Least Squares (NLLS) 5 Statistical evaluation of solutions 6 Model selection Stéphane Mottelet (UTC) Least squares 34 / 55

37 Non Linear Least Squares (NLLS) Example Consider data (x i, y i ) to be fitted by the non linear model y = f (x; θ) = exp(θ 1 + θ 2 x), The log trick leads some people to minimize S log (θ) = n (log y i (θ 1 + θ 2 x)) 2, i=1 i.e. do simple linear regression of (log y i ) against (x i ), but this violates a fundamental hypothesis, because if y i f (x i ; θ) is Gaussian then log y i log f (x i ; θ) is not! Stéphane Mottelet (UTC) Least squares 35 / 55

38 Non Linear Least Squares (NLLS) Possibles angles of attack Remind that S(θ) = r(θ) 2, r i (θ) = f (x i ; θ) y i. A local minimum of S can be found by different methods : Compute a solution of the non linear systems of equations with the Newton s method : S(θ) = 2J r (θ) r(θ) = 0, needs to compute the Jacobian of the gradient itself (do you really want to compute second derivatives?) does not guarantee convergence towards a minimum Stéphane Mottelet (UTC) Least squares 36 / 55

39 Non Linear Least Squares (NLLS) Possibles angles of attack Use the spirit of Newton s method as follows : start with θ 0 and for each k consider the development of the residual vector at θ k with h = θ θ k r(θ) = r(θ k ) + J r (θ k )(θ θ k ) + θ θ k ε(θ θ k ) and take θ k+1 as the vector minimizing S k (θ) = r(θ k ) + J r (θ k )(θ θ k ) 2 finding θ k+1 is a LLS problem! Stéphane Mottelet (UTC) Least squares 37 / 55

40 Non Linear Least Squares (NLLS) Gauss Newton method Original formulation θ k+1 = θ k [J r (θ k ) J r (θ k )] 1 J r (θ k ) r(θ k ), = θ k 1 2 [J r (θ k ) J r (θ k )] 1 S(θ k ) What can you do when the rank of J r (θ k ) is deficient? Pick up a λ > 0 and take θ k+1 as the vector minimizing Levenberg-Marquardt method r(θ k ) + J r (θ k )(θ θ k ) 2 + λ (θ θ k ) 2 θ k+1 = θ k 1 2 [J r (θ k ) J r (θ k ) + λi] 1 S(θ k ) λ allows to balance between speed (λ = 0) and robustness (λ ) Stéphane Mottelet (UTC) Least squares 38 / 55

41 Non Linear Least Squares (NLLS) Example 1 Consider again data (x i, y i ) to be fitted by the non linear model f (x; θ) = exp(θ 1 + θ 2 x), The Jacobian of r(θ) is given by exp(θ 1 + θ 2 x 1 ) x 1 exp(θ 1 + θ 2 x 1 ) J r (θ) =.. exp(θ 1 + θ 2 x n ) x n exp(θ 1 + θ 2 x 1 ) Stéphane Mottelet (UTC) Least squares 39 / 55

42 Non Linear Least Squares (NLLS) Example 1 In Scilab, use the lsqrsolve or leastsq function function r=resid(theta,n) r=exp(theta(1)+theta(2)*x)-y; endfunction function j=jac(theta,n) e=exp(theta(1)+theta(2)*x); j=[e x.*e]; endfunction ˆθ = (1.001, ) load data_exp.dat theta0=[0;0]; theta=lsqrsolve(theta0,resid,30,jac); plot(x,y,"ob",... x,exp(theta(1)+theta(2)*x),"r") Stéphane Mottelet (UTC) Least squares 40 / 55

43 Non Linear Least Squares (NLLS) Example 2 Enzymatic kinetics s s(t) (t) = θ 2, t > 0, s(t) + θ 3 s(0) = θ 1, y i = measurement of s at time t i S(θ) = r(θ) 2, r i (θ) = y i s(t i ) σ i Individual weighting allows to take into account different std. deviations of measurements Stéphane Mottelet (UTC) Least squares 41 / 55

44 Non Linear Least Squares (NLLS) Example 2 In Scilab, use the lsqrsolve or leastsq function function dsdt=michaelis(t,s,theta) dsdt=theta(2)*s/(s+theta(3)) endfunction function r=resid(theta,n) s=ode(theta(1),0,t,michaelis) r=(s-y)./sigma endfunction ˆθ = (887.9, 37.6, 97.7) load michaelis_data.dat theta0=[y(1);20;80]; theta=lsqrsolve(theta0,resid,n) If not provided, the Jacobian J r (θ) is approximated by finite differences (but analytical derivatives always speed up convergence) Stéphane Mottelet (UTC) Least squares 42 / 55

45 Non Linear Least Squares (NLLS) Take home message Take home message #3 : Solving non linear least squares problems is not that difficult with adequate software and good starting values Stéphane Mottelet (UTC) Least squares 43 / 55

46 Statistical evaluation of solutions 1 Motivation and statistical framework 2 Maths reminder 3 Linear Least Squares (LLS) 4 Non Linear Least Squares (NLLS) 5 Statistical evaluation of solutions 6 Model selection Stéphane Mottelet (UTC) Least squares 44 / 55

47 Statistical evaluation of solutions Motivation Since the data (y i ) i=1...n is a sample of random variables, then ˆθ too! Confidence intervals for ˆθ can be easily obtained by at least two methods Monte-Carlo method : allows to estimate the distribution of ˆθ but needs thousands of resamplings Linearized statistics : very fast, but can be very approximate for high level of measurement error Stéphane Mottelet (UTC) Least squares 45 / 55

48 Statistical evaluation of solutions Monte Carlo method The Monte Carlo is a resampling method, i.e. works by generating new samples of synthetic measurement and redoing the estimation of ˆθ. Here model is y = θ 1 + θ 2 x + θ 3 x 2, and data is corrupted by noise with σ = 1 2 Stéphane Mottelet (UTC) Least squares 46 / 55

49 Statistical evaluation of solutions Monte Carlo method θ 1 θ 2 θ 3 At confidence level=95%, ˆθ 1 [0.99, 1.29], ˆθ 2 [ 1.20, 0.85], ˆθ 1 [ 2.57, 1.91]. Stéphane Mottelet (UTC) Least squares 47 / 55

50 Statistical evaluation of solutions Linearized Statistics Define the weighted residual r(θ) by r i (θ) = y i f (x i ; θ) σ i, where σ i is the standard deviation of y i. The covariance matrix of ˆθ can be approximated by V (ˆθ) = F (ˆθ) 1 where F(ˆθ) is the Fisher Information Matrix, given by F (θ) = J r (θ) J r (θ) For example, when σ i = σ for all i, in LLS problems V (ˆθ) = σ 2 A A Stéphane Mottelet (UTC) Least squares 48 / 55

51 Statistical evaluation of solutions Linearized Statistics d=derivative(resid,theta) V=inv(d *d) sigma_theta=sqrt(diag(v)) // fractile Student dist. t_alpha=cdft("t",m-3,0.975,0.025); ˆθ = (887.9, 37.6, 97.7) thetamin=theta-t_alpha*sigma_theta thetamax=theta+t_alpha*sigma_theta At 95% confidence level ˆθ 1 [856.68, ], ˆθ2 [34.13, 41.21], ˆθ3 [93.37, ]. Stéphane Mottelet (UTC) Least squares 49 / 55

52 Statistical evaluation of solutions 1 Motivation and statistical framework 2 Maths reminder 3 Linear Least Squares (LLS) 4 Non Linear Least Squares (NLLS) 5 Statistical evaluation of solutions 6 Model selection Stéphane Mottelet (UTC) Least squares 50 / 55

53 Model selection Motivation : which model is the best? Stéphane Mottelet (UTC) Least squares 51 / 55

54 Model selection Motivation : which model is the best? On the previous slide data has been fitted with the model p y = θ k x k, p = , k=0 Consider S(ˆθ) as a function of model order p does not help much p S(ˆθ) Stéphane Mottelet (UTC) Least squares 52 / 55

55 Model selection Validation Validation is the key of model selection : 1 Define two sets of data V {1,... n} for validation T = {1,... n} \ V for model training 2 For each value of model order p Compute the optimal parameters with the training data ˆθ p = arg min θ R p i T (y i f (x i ; θ)) 2 Compute the validation residual S V (ˆθ p) = i V (y i f (x i ; ˆθ p)) 2 Stéphane Mottelet (UTC) Least squares 53 / 55

56 Model selection Validation Validation helps a lot : here the best model order is clearly p = 3! p S V (ˆθ p ) Stéphane Mottelet (UTC) Least squares 54 / 55

57 Statistical evaluation and model selection Take home message Take home message #4 : Always evaluate your models by either computing confidence intervals for the parameters or by using validation. Stéphane Mottelet (UTC) Least squares 55 / 55

Chapter 3 Numerical Methods

Chapter 3 Numerical Methods Chapter 3 Numerical Methods Part 2 3.2 Systems of Equations 3.3 Nonlinear and Constrained Optimization 1 Outline 3.2 Systems of Equations 3.3 Nonlinear and Constrained Optimization Summary 2 Outline 3.2

More information

Association studies and regression

Association studies and regression Association studies and regression CM226: Machine Learning for Bioinformatics. Fall 2016 Sriram Sankararaman Acknowledgments: Fei Sha, Ameet Talwalkar Association studies and regression 1 / 104 Administration

More information

Machine Learning. Lecture 4: Regularization and Bayesian Statistics. Feng Li. https://funglee.github.io

Machine Learning. Lecture 4: Regularization and Bayesian Statistics. Feng Li. https://funglee.github.io Machine Learning Lecture 4: Regularization and Bayesian Statistics Feng Li fli@sdu.edu.cn https://funglee.github.io School of Computer Science and Technology Shandong University Fall 207 Overfitting Problem

More information

13. Nonlinear least squares

13. Nonlinear least squares L. Vandenberghe ECE133A (Fall 2018) 13. Nonlinear least squares definition and examples derivatives and optimality condition Gauss Newton method Levenberg Marquardt method 13.1 Nonlinear least squares

More information

Estimation theory. Parametric estimation. Properties of estimators. Minimum variance estimator. Cramer-Rao bound. Maximum likelihood estimators

Estimation theory. Parametric estimation. Properties of estimators. Minimum variance estimator. Cramer-Rao bound. Maximum likelihood estimators Estimation theory Parametric estimation Properties of estimators Minimum variance estimator Cramer-Rao bound Maximum likelihood estimators Confidence intervals Bayesian estimation 1 Random Variables Let

More information

Statement: With my signature I confirm that the solutions are the product of my own work. Name: Signature:.

Statement: With my signature I confirm that the solutions are the product of my own work. Name: Signature:. MATHEMATICAL STATISTICS Homework assignment Instructions Please turn in the homework with this cover page. You do not need to edit the solutions. Just make sure the handwriting is legible. You may discuss

More information

Bayesian Interpretations of Regularization

Bayesian Interpretations of Regularization Bayesian Interpretations of Regularization Charlie Frogner 9.50 Class 15 April 1, 009 The Plan Regularized least squares maps {(x i, y i )} n i=1 to a function that minimizes the regularized loss: f S

More information

ESTIMATION THEORY. Chapter Estimation of Random Variables

ESTIMATION THEORY. Chapter Estimation of Random Variables Chapter ESTIMATION THEORY. Estimation of Random Variables Suppose X,Y,Y 2,...,Y n are random variables defined on the same probability space (Ω, S,P). We consider Y,...,Y n to be the observed random variables

More information

STAT 100C: Linear models

STAT 100C: Linear models STAT 100C: Linear models Arash A. Amini June 9, 2018 1 / 56 Table of Contents Multiple linear regression Linear model setup Estimation of β Geometric interpretation Estimation of σ 2 Hat matrix Gram matrix

More information

Probabilistic Machine Learning. Industrial AI Lab.

Probabilistic Machine Learning. Industrial AI Lab. Probabilistic Machine Learning Industrial AI Lab. Probabilistic Linear Regression Outline Probabilistic Classification Probabilistic Clustering Probabilistic Dimension Reduction 2 Probabilistic Linear

More information

Convex Optimization M2

Convex Optimization M2 Convex Optimization M2 Lecture 8 A. d Aspremont. Convex Optimization M2. 1/57 Applications A. d Aspremont. Convex Optimization M2. 2/57 Outline Geometrical problems Approximation problems Combinatorial

More information

A Quick Tour of Linear Algebra and Optimization for Machine Learning

A Quick Tour of Linear Algebra and Optimization for Machine Learning A Quick Tour of Linear Algebra and Optimization for Machine Learning Masoud Farivar January 8, 2015 1 / 28 Outline of Part I: Review of Basic Linear Algebra Matrices and Vectors Matrix Multiplication Operators

More information

Gaussian vectors and central limit theorem

Gaussian vectors and central limit theorem Gaussian vectors and central limit theorem Samy Tindel Purdue University Probability Theory 2 - MA 539 Samy T. Gaussian vectors & CLT Probability Theory 1 / 86 Outline 1 Real Gaussian random variables

More information

Multivariate Random Variable

Multivariate Random Variable Multivariate Random Variable Author: Author: Andrés Hincapié and Linyi Cao This Version: August 7, 2016 Multivariate Random Variable 3 Now we consider models with more than one r.v. These are called multivariate

More information

CPSC 540: Machine Learning

CPSC 540: Machine Learning CPSC 540: Machine Learning Multivariate Gaussians Mark Schmidt University of British Columbia Winter 2019 Last Time: Multivariate Gaussian http://personal.kenyon.edu/hartlaub/mellonproject/bivariate2.html

More information

Practical Statistics

Practical Statistics Practical Statistics Lecture 1 (Nov. 9): - Correlation - Hypothesis Testing Lecture 2 (Nov. 16): - Error Estimation - Bayesian Analysis - Rejecting Outliers Lecture 3 (Nov. 18) - Monte Carlo Modeling -

More information

A Least Squares Formulation for Canonical Correlation Analysis

A Least Squares Formulation for Canonical Correlation Analysis A Least Squares Formulation for Canonical Correlation Analysis Liang Sun, Shuiwang Ji, and Jieping Ye Department of Computer Science and Engineering Arizona State University Motivation Canonical Correlation

More information

Time Series Analysis

Time Series Analysis Time Series Analysis hm@imm.dtu.dk Informatics and Mathematical Modelling Technical University of Denmark DK-2800 Kgs. Lyngby 1 Outline of the lecture Regression based methods, 1st part: Introduction (Sec.

More information

Mark your answers ON THE EXAM ITSELF. If you are not sure of your answer you may wish to provide a brief explanation.

Mark your answers ON THE EXAM ITSELF. If you are not sure of your answer you may wish to provide a brief explanation. CS 189 Spring 2015 Introduction to Machine Learning Midterm You have 80 minutes for the exam. The exam is closed book, closed notes except your one-page crib sheet. No calculators or electronic items.

More information

Linear Models for Classification

Linear Models for Classification Linear Models for Classification Oliver Schulte - CMPT 726 Bishop PRML Ch. 4 Classification: Hand-written Digit Recognition CHINE INTELLIGENCE, VOL. 24, NO. 24, APRIL 2002 x i = t i = (0, 0, 0, 1, 0, 0,

More information

Regression Estimation - Least Squares and Maximum Likelihood. Dr. Frank Wood

Regression Estimation - Least Squares and Maximum Likelihood. Dr. Frank Wood Regression Estimation - Least Squares and Maximum Likelihood Dr. Frank Wood Least Squares Max(min)imization Function to minimize w.r.t. β 0, β 1 Q = n (Y i (β 0 + β 1 X i )) 2 i=1 Minimize this by maximizing

More information

Outline of GLMs. Definitions

Outline of GLMs. Definitions Outline of GLMs Definitions This is a short outline of GLM details, adapted from the book Nonparametric Regression and Generalized Linear Models, by Green and Silverman. The responses Y i have density

More information

Lecture 2 Machine Learning Review

Lecture 2 Machine Learning Review Lecture 2 Machine Learning Review CMSC 35246: Deep Learning Shubhendu Trivedi & Risi Kondor University of Chicago March 29, 2017 Things we will look at today Formal Setup for Supervised Learning Things

More information

Overview. Probabilistic Interpretation of Linear Regression Maximum Likelihood Estimation Bayesian Estimation MAP Estimation

Overview. Probabilistic Interpretation of Linear Regression Maximum Likelihood Estimation Bayesian Estimation MAP Estimation Overview Probabilistic Interpretation of Linear Regression Maximum Likelihood Estimation Bayesian Estimation MAP Estimation Probabilistic Interpretation: Linear Regression Assume output y is generated

More information

Ph.D. Qualifying Exam Friday Saturday, January 6 7, 2017

Ph.D. Qualifying Exam Friday Saturday, January 6 7, 2017 Ph.D. Qualifying Exam Friday Saturday, January 6 7, 2017 Put your solution to each problem on a separate sheet of paper. Problem 1. (5106) Let X 1, X 2,, X n be a sequence of i.i.d. observations from a

More information

Regression Estimation Least Squares and Maximum Likelihood

Regression Estimation Least Squares and Maximum Likelihood Regression Estimation Least Squares and Maximum Likelihood Dr. Frank Wood Frank Wood, fwood@stat.columbia.edu Linear Regression Models Lecture 3, Slide 1 Least Squares Max(min)imization Function to minimize

More information

Least-Squares Fitting of Model Parameters to Experimental Data

Least-Squares Fitting of Model Parameters to Experimental Data Least-Squares Fitting of Model Parameters to Experimental Data Div. of Mathematical Sciences, Dept of Engineering Sciences and Mathematics, LTU, room E193 Outline of talk What about Science and Scientific

More information

Introduction to Compressed Sensing

Introduction to Compressed Sensing Introduction to Compressed Sensing Alejandro Parada, Gonzalo Arce University of Delaware August 25, 2016 Motivation: Classical Sampling 1 Motivation: Classical Sampling Issues Some applications Radar Spectral

More information

Linear Regression. In this problem sheet, we consider the problem of linear regression with p predictors and one intercept,

Linear Regression. In this problem sheet, we consider the problem of linear regression with p predictors and one intercept, Linear Regression In this problem sheet, we consider the problem of linear regression with p predictors and one intercept, y = Xβ + ɛ, where y t = (y 1,..., y n ) is the column vector of target values,

More information

Chapter 5 Matrix Approach to Simple Linear Regression

Chapter 5 Matrix Approach to Simple Linear Regression STAT 525 SPRING 2018 Chapter 5 Matrix Approach to Simple Linear Regression Professor Min Zhang Matrix Collection of elements arranged in rows and columns Elements will be numbers or symbols For example:

More information

Data Fitting and Uncertainty

Data Fitting and Uncertainty TiloStrutz Data Fitting and Uncertainty A practical introduction to weighted least squares and beyond With 124 figures, 23 tables and 71 test questions and examples VIEWEG+ TEUBNER IX Contents I Framework

More information

1 Inner Product and Orthogonality

1 Inner Product and Orthogonality CSCI 4/Fall 6/Vora/GWU/Orthogonality and Norms Inner Product and Orthogonality Definition : The inner product of two vectors x and y, x x x =.., y =. x n y y... y n is denoted x, y : Note that n x, y =

More information

σ(a) = a N (x; 0, 1 2 ) dx. σ(a) = Φ(a) =

σ(a) = a N (x; 0, 1 2 ) dx. σ(a) = Φ(a) = Until now we have always worked with likelihoods and prior distributions that were conjugate to each other, allowing the computation of the posterior distribution to be done in closed form. Unfortunately,

More information

Rank-Deficient Nonlinear Least Squares Problems and Subset Selection

Rank-Deficient Nonlinear Least Squares Problems and Subset Selection Rank-Deficient Nonlinear Least Squares Problems and Subset Selection Ilse Ipsen North Carolina State University Raleigh, NC, USA Joint work with: Tim Kelley and Scott Pope Motivating Application Overview

More information

Max. Likelihood Estimation. Outline. Econometrics II. Ricardo Mora. Notes. Notes

Max. Likelihood Estimation. Outline. Econometrics II. Ricardo Mora. Notes. Notes Maximum Likelihood Estimation Econometrics II Department of Economics Universidad Carlos III de Madrid Máster Universitario en Desarrollo y Crecimiento Económico Outline 1 3 4 General Approaches to Parameter

More information

Regression. Oscar García

Regression. Oscar García Regression Oscar García Regression methods are fundamental in Forest Mensuration For a more concise and general presentation, we shall first review some matrix concepts 1 Matrices An order n m matrix is

More information

Lecture 5: Linear models for classification. Logistic regression. Gradient Descent. Second-order methods.

Lecture 5: Linear models for classification. Logistic regression. Gradient Descent. Second-order methods. Lecture 5: Linear models for classification. Logistic regression. Gradient Descent. Second-order methods. Linear models for classification Logistic regression Gradient descent and second-order methods

More information

[y i α βx i ] 2 (2) Q = i=1

[y i α βx i ] 2 (2) Q = i=1 Least squares fits This section has no probability in it. There are no random variables. We are given n points (x i, y i ) and want to find the equation of the line that best fits them. We take the equation

More information

1. The OLS Estimator. 1.1 Population model and notation

1. The OLS Estimator. 1.1 Population model and notation 1. The OLS Estimator OLS stands for Ordinary Least Squares. There are 6 assumptions ordinarily made, and the method of fitting a line through data is by least-squares. OLS is a common estimation methodology

More information

EEL 5544 Noise in Linear Systems Lecture 30. X (s) = E [ e sx] f X (x)e sx dx. Moments can be found from the Laplace transform as

EEL 5544 Noise in Linear Systems Lecture 30. X (s) = E [ e sx] f X (x)e sx dx. Moments can be found from the Laplace transform as L30-1 EEL 5544 Noise in Linear Systems Lecture 30 OTHER TRANSFORMS For a continuous, nonnegative RV X, the Laplace transform of X is X (s) = E [ e sx] = 0 f X (x)e sx dx. For a nonnegative RV, the Laplace

More information

Maximum Likelihood Tests and Quasi-Maximum-Likelihood

Maximum Likelihood Tests and Quasi-Maximum-Likelihood Maximum Likelihood Tests and Quasi-Maximum-Likelihood Wendelin Schnedler Department of Economics University of Heidelberg 10. Dezember 2007 Wendelin Schnedler (AWI) Maximum Likelihood Tests and Quasi-Maximum-Likelihood10.

More information

Nonlinear Least Squares

Nonlinear Least Squares Nonlinear Least Squares Stephen Boyd EE103 Stanford University December 6, 2016 Outline Nonlinear equations and least squares Examples Levenberg-Marquardt algorithm Nonlinear least squares classification

More information

Fractional Imputation in Survey Sampling: A Comparative Review

Fractional Imputation in Survey Sampling: A Comparative Review Fractional Imputation in Survey Sampling: A Comparative Review Shu Yang Jae-Kwang Kim Iowa State University Joint Statistical Meetings, August 2015 Outline Introduction Fractional imputation Features Numerical

More information

Efficient Monte Carlo computation of Fisher information matrix using prior information

Efficient Monte Carlo computation of Fisher information matrix using prior information Efficient Monte Carlo computation of Fisher information matrix using prior information Sonjoy Das, UB James C. Spall, APL/JHU Roger Ghanem, USC SIAM Conference on Data Mining Anaheim, California, USA April

More information

Bayesian Support Vector Machines for Feature Ranking and Selection

Bayesian Support Vector Machines for Feature Ranking and Selection Bayesian Support Vector Machines for Feature Ranking and Selection written by Chu, Keerthi, Ong, Ghahramani Patrick Pletscher pat@student.ethz.ch ETH Zurich, Switzerland 12th January 2006 Overview 1 Introduction

More information

Linear Models in Machine Learning

Linear Models in Machine Learning CS540 Intro to AI Linear Models in Machine Learning Lecturer: Xiaojin Zhu jerryzhu@cs.wisc.edu We briefly go over two linear models frequently used in machine learning: linear regression for, well, regression,

More information

Machine Learning. Lecture 3: Logistic Regression. Feng Li.

Machine Learning. Lecture 3: Logistic Regression. Feng Li. Machine Learning Lecture 3: Logistic Regression Feng Li fli@sdu.edu.cn https://funglee.github.io School of Computer Science and Technology Shandong University Fall 2016 Logistic Regression Classification

More information

Today. Probability and Statistics. Linear Algebra. Calculus. Naïve Bayes Classification. Matrix Multiplication Matrix Inversion

Today. Probability and Statistics. Linear Algebra. Calculus. Naïve Bayes Classification. Matrix Multiplication Matrix Inversion Today Probability and Statistics Naïve Bayes Classification Linear Algebra Matrix Multiplication Matrix Inversion Calculus Vector Calculus Optimization Lagrange Multipliers 1 Classical Artificial Intelligence

More information

GWAS V: Gaussian processes

GWAS V: Gaussian processes GWAS V: Gaussian processes Dr. Oliver Stegle Christoh Lippert Prof. Dr. Karsten Borgwardt Max-Planck-Institutes Tübingen, Germany Tübingen Summer 2011 Oliver Stegle GWAS V: Gaussian processes Summer 2011

More information

Linear Methods for Prediction

Linear Methods for Prediction Chapter 5 Linear Methods for Prediction 5.1 Introduction We now revisit the classification problem and focus on linear methods. Since our prediction Ĝ(x) will always take values in the discrete set G we

More information

AM 205: lecture 6. Last time: finished the data fitting topic Today s lecture: numerical linear algebra, LU factorization

AM 205: lecture 6. Last time: finished the data fitting topic Today s lecture: numerical linear algebra, LU factorization AM 205: lecture 6 Last time: finished the data fitting topic Today s lecture: numerical linear algebra, LU factorization Unit II: Numerical Linear Algebra Motivation Almost everything in Scientific Computing

More information

Gaussian Processes. Le Song. Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012

Gaussian Processes. Le Song. Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012 Gaussian Processes Le Song Machine Learning II: Advanced Topics CSE 8803ML, Spring 01 Pictorial view of embedding distribution Transform the entire distribution to expected features Feature space Feature

More information

Linear Models for Regression CS534

Linear Models for Regression CS534 Linear Models for Regression CS534 Example Regression Problems Predict housing price based on House size, lot size, Location, # of rooms Predict stock price based on Price history of the past month Predict

More information

University of Houston, Department of Mathematics Numerical Analysis, Fall 2005

University of Houston, Department of Mathematics Numerical Analysis, Fall 2005 3 Numerical Solution of Nonlinear Equations and Systems 3.1 Fixed point iteration Reamrk 3.1 Problem Given a function F : lr n lr n, compute x lr n such that ( ) F(x ) = 0. In this chapter, we consider

More information

Fitting Multidimensional Latent Variable Models using an Efficient Laplace Approximation

Fitting Multidimensional Latent Variable Models using an Efficient Laplace Approximation Fitting Multidimensional Latent Variable Models using an Efficient Laplace Approximation Dimitris Rizopoulos Department of Biostatistics, Erasmus University Medical Center, the Netherlands d.rizopoulos@erasmusmc.nl

More information

Linear Models for Regression CS534

Linear Models for Regression CS534 Linear Models for Regression CS534 Example Regression Problems Predict housing price based on House size, lot size, Location, # of rooms Predict stock price based on Price history of the past month Predict

More information

Estimation and Model Selection in Mixed Effects Models Part I. Adeline Samson 1

Estimation and Model Selection in Mixed Effects Models Part I. Adeline Samson 1 Estimation and Model Selection in Mixed Effects Models Part I Adeline Samson 1 1 University Paris Descartes Summer school 2009 - Lipari, Italy These slides are based on Marc Lavielle s slides Outline 1

More information

Linear Algebra: Matrix Eigenvalue Problems

Linear Algebra: Matrix Eigenvalue Problems CHAPTER8 Linear Algebra: Matrix Eigenvalue Problems Chapter 8 p1 A matrix eigenvalue problem considers the vector equation (1) Ax = λx. 8.0 Linear Algebra: Matrix Eigenvalue Problems Here A is a given

More information

POLI 8501 Introduction to Maximum Likelihood Estimation

POLI 8501 Introduction to Maximum Likelihood Estimation POLI 8501 Introduction to Maximum Likelihood Estimation Maximum Likelihood Intuition Consider a model that looks like this: Y i N(µ, σ 2 ) So: E(Y ) = µ V ar(y ) = σ 2 Suppose you have some data on Y,

More information

Support Vector Machines and Kernel Methods

Support Vector Machines and Kernel Methods 2018 CS420 Machine Learning, Lecture 3 Hangout from Prof. Andrew Ng. http://cs229.stanford.edu/notes/cs229-notes3.pdf Support Vector Machines and Kernel Methods Weinan Zhang Shanghai Jiao Tong University

More information

An introduction to particle filters

An introduction to particle filters An introduction to particle filters Andreas Svensson Department of Information Technology Uppsala University June 10, 2014 June 10, 2014, 1 / 16 Andreas Svensson - An introduction to particle filters Outline

More information

Computer Intensive Methods in Mathematical Statistics

Computer Intensive Methods in Mathematical Statistics Computer Intensive Methods in Mathematical Statistics Department of mathematics johawes@kth.se Lecture 16 Advanced topics in computational statistics 18 May 2017 Computer Intensive Methods (1) Plan of

More information

CS60021: Scalable Data Mining. Large Scale Machine Learning

CS60021: Scalable Data Mining. Large Scale Machine Learning J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org 1 CS60021: Scalable Data Mining Large Scale Machine Learning Sourangshu Bhattacharya Example: Spam filtering Instance

More information

ECE 275A Homework 6 Solutions

ECE 275A Homework 6 Solutions ECE 275A Homework 6 Solutions. The notation used in the solutions for the concentration (hyper) ellipsoid problems is defined in the lecture supplement on concentration ellipsoids. Note that θ T Σ θ =

More information

Implicit sampling for particle filters. Alexandre Chorin, Mathias Morzfeld, Xuemin Tu, Ethan Atkins

Implicit sampling for particle filters. Alexandre Chorin, Mathias Morzfeld, Xuemin Tu, Ethan Atkins 0/20 Implicit sampling for particle filters Alexandre Chorin, Mathias Morzfeld, Xuemin Tu, Ethan Atkins University of California at Berkeley 2/20 Example: Try to find people in a boat in the middle of

More information

Math 423/533: The Main Theoretical Topics

Math 423/533: The Main Theoretical Topics Math 423/533: The Main Theoretical Topics Notation sample size n, data index i number of predictors, p (p = 2 for simple linear regression) y i : response for individual i x i = (x i1,..., x ip ) (1 p)

More information

Support Vector Machines (SVM) in bioinformatics. Day 1: Introduction to SVM

Support Vector Machines (SVM) in bioinformatics. Day 1: Introduction to SVM 1 Support Vector Machines (SVM) in bioinformatics Day 1: Introduction to SVM Jean-Philippe Vert Bioinformatics Center, Kyoto University, Japan Jean-Philippe.Vert@mines.org Human Genome Center, University

More information

Regression Models - Introduction

Regression Models - Introduction Regression Models - Introduction In regression models there are two types of variables that are studied: A dependent variable, Y, also called response variable. It is modeled as random. An independent

More information

Nonparametric Bayesian Methods (Gaussian Processes)

Nonparametric Bayesian Methods (Gaussian Processes) [70240413 Statistical Machine Learning, Spring, 2015] Nonparametric Bayesian Methods (Gaussian Processes) Jun Zhu dcszj@mail.tsinghua.edu.cn http://bigml.cs.tsinghua.edu.cn/~jun State Key Lab of Intelligent

More information

Classifier Complexity and Support Vector Classifiers

Classifier Complexity and Support Vector Classifiers Classifier Complexity and Support Vector Classifiers Feature 2 6 4 2 0 2 4 6 8 RBF kernel 10 10 8 6 4 2 0 2 4 6 Feature 1 David M.J. Tax Pattern Recognition Laboratory Delft University of Technology D.M.J.Tax@tudelft.nl

More information

Lecture Notes Special: Using Neural Networks for Mean-Squared Estimation. So: How can we evaluate E(X Y )? What is a neural network? How is it useful?

Lecture Notes Special: Using Neural Networks for Mean-Squared Estimation. So: How can we evaluate E(X Y )? What is a neural network? How is it useful? Lecture Notes Special: Using Neural Networks for Mean-Squared Estimation The Story So Far... So: How can we evaluate E(X Y )? What is a neural network? How is it useful? EE 278: Using Neural Networks for

More information

Pose estimation from point and line correspondences

Pose estimation from point and line correspondences Pose estimation from point and line correspondences Giorgio Panin October 17, 008 1 Problem formulation Estimate (in a LSE sense) the pose of an object from N correspondences between known object points

More information

Topic 7 - Matrix Approach to Simple Linear Regression. Outline. Matrix. Matrix. Review of Matrices. Regression model in matrix form

Topic 7 - Matrix Approach to Simple Linear Regression. Outline. Matrix. Matrix. Review of Matrices. Regression model in matrix form Topic 7 - Matrix Approach to Simple Linear Regression Review of Matrices Outline Regression model in matrix form - Fall 03 Calculations using matrices Topic 7 Matrix Collection of elements arranged in

More information

Exponential Family and Maximum Likelihood, Gaussian Mixture Models and the EM Algorithm. by Korbinian Schwinger

Exponential Family and Maximum Likelihood, Gaussian Mixture Models and the EM Algorithm. by Korbinian Schwinger Exponential Family and Maximum Likelihood, Gaussian Mixture Models and the EM Algorithm by Korbinian Schwinger Overview Exponential Family Maximum Likelihood The EM Algorithm Gaussian Mixture Models Exponential

More information

Computer Vision Group Prof. Daniel Cremers. 4. Gaussian Processes - Regression

Computer Vision Group Prof. Daniel Cremers. 4. Gaussian Processes - Regression Group Prof. Daniel Cremers 4. Gaussian Processes - Regression Definition (Rep.) Definition: A Gaussian process is a collection of random variables, any finite number of which have a joint Gaussian distribution.

More information

Master s Written Examination

Master s Written Examination Master s Written Examination Option: Statistics and Probability Spring 016 Full points may be obtained for correct answers to eight questions. Each numbered question which may have several parts is worth

More information

Lecture 5: GPs and Streaming regression

Lecture 5: GPs and Streaming regression Lecture 5: GPs and Streaming regression Gaussian Processes Information gain Confidence intervals COMP-652 and ECSE-608, Lecture 5 - September 19, 2017 1 Recall: Non-parametric regression Input space X

More information

Part IB Statistics. Theorems with proof. Based on lectures by D. Spiegelhalter Notes taken by Dexter Chua. Lent 2015

Part IB Statistics. Theorems with proof. Based on lectures by D. Spiegelhalter Notes taken by Dexter Chua. Lent 2015 Part IB Statistics Theorems with proof Based on lectures by D. Spiegelhalter Notes taken by Dexter Chua Lent 2015 These notes are not endorsed by the lecturers, and I have modified them (often significantly)

More information

The outline for Unit 3

The outline for Unit 3 The outline for Unit 3 Unit 1. Introduction: The regression model. Unit 2. Estimation principles. Unit 3: Hypothesis testing principles. 3.1 Wald test. 3.2 Lagrange Multiplier. 3.3 Likelihood Ratio Test.

More information

Augmented Reality VU numerical optimization algorithms. Prof. Vincent Lepetit

Augmented Reality VU numerical optimization algorithms. Prof. Vincent Lepetit Augmented Reality VU numerical optimization algorithms Prof. Vincent Lepetit P3P: can exploit only 3 correspondences; DLT: difficult to exploit the knowledge of the internal parameters correctly, and the

More information

Lecture 9. Errors in solving Linear Systems. J. Chaudhry (Zeb) Department of Mathematics and Statistics University of New Mexico

Lecture 9. Errors in solving Linear Systems. J. Chaudhry (Zeb) Department of Mathematics and Statistics University of New Mexico Lecture 9 Errors in solving Linear Systems J. Chaudhry (Zeb) Department of Mathematics and Statistics University of New Mexico J. Chaudhry (Zeb) (UNM) Math/CS 375 1 / 23 What we ll do: Norms and condition

More information

Outline. Remedial Measures) Extra Sums of Squares Standardized Version of the Multiple Regression Model

Outline. Remedial Measures) Extra Sums of Squares Standardized Version of the Multiple Regression Model Outline 1 Multiple Linear Regression (Estimation, Inference, Diagnostics and Remedial Measures) 2 Special Topics for Multiple Regression Extra Sums of Squares Standardized Version of the Multiple Regression

More information

Lecture 4: Types of errors. Bayesian regression models. Logistic regression

Lecture 4: Types of errors. Bayesian regression models. Logistic regression Lecture 4: Types of errors. Bayesian regression models. Logistic regression A Bayesian interpretation of regularization Bayesian vs maximum likelihood fitting more generally COMP-652 and ECSE-68, Lecture

More information

Sample Geometry. Edps/Soc 584, Psych 594. Carolyn J. Anderson

Sample Geometry. Edps/Soc 584, Psych 594. Carolyn J. Anderson Sample Geometry Edps/Soc 584, Psych 594 Carolyn J. Anderson Department of Educational Psychology I L L I N O I S university of illinois at urbana-champaign c Board of Trustees, University of Illinois Spring

More information

Advanced Statistics II: Non Parametric Tests

Advanced Statistics II: Non Parametric Tests Advanced Statistics II: Non Parametric Tests Aurélien Garivier ParisTech February 27, 2011 Outline Fitting a distribution Rank Tests for the comparison of two samples Two unrelated samples: Mann-Whitney

More information

STAT5044: Regression and Anova. Inyoung Kim

STAT5044: Regression and Anova. Inyoung Kim STAT5044: Regression and Anova Inyoung Kim 2 / 47 Outline 1 Regression 2 Simple Linear regression 3 Basic concepts in regression 4 How to estimate unknown parameters 5 Properties of Least Squares Estimators:

More information

Estimation, Detection, and Identification

Estimation, Detection, and Identification Estimation, Detection, and Identification Graduate Course on the CMU/Portugal ECE PhD Program Spring 2008/2009 Chapter 5 Best Linear Unbiased Estimators Instructor: Prof. Paulo Jorge Oliveira pjcro @ isr.ist.utl.pt

More information

STAT 730 Chapter 4: Estimation

STAT 730 Chapter 4: Estimation STAT 730 Chapter 4: Estimation Timothy Hanson Department of Statistics, University of South Carolina Stat 730: Multivariate Analysis 1 / 23 The likelihood We have iid data, at least initially. Each datum

More information

Rirdge Regression. Szymon Bobek. Institute of Applied Computer science AGH University of Science and Technology

Rirdge Regression. Szymon Bobek. Institute of Applied Computer science AGH University of Science and Technology Rirdge Regression Szymon Bobek Institute of Applied Computer science AGH University of Science and Technology Based on Carlos Guestrin adn Emily Fox slides from Coursera Specialization on Machine Learnign

More information

Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen. Bayesian Learning. Tobias Scheffer, Niels Landwehr

Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen. Bayesian Learning. Tobias Scheffer, Niels Landwehr Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen Bayesian Learning Tobias Scheffer, Niels Landwehr Remember: Normal Distribution Distribution over x. Density function with parameters

More information

This does not cover everything on the final. Look at the posted practice problems for other topics.

This does not cover everything on the final. Look at the posted practice problems for other topics. Class 7: Review Problems for Final Exam 8.5 Spring 7 This does not cover everything on the final. Look at the posted practice problems for other topics. To save time in class: set up, but do not carry

More information

Least squares: introduction to the network adjustment

Least squares: introduction to the network adjustment Least squares: introduction to the network adjustment Experimental evidence and consequences Observations of the same quantity that have been performed at the highest possible accuracy provide different

More information

Linear Algebra Practice Final

Linear Algebra Practice Final . Let (a) First, Linear Algebra Practice Final Summer 3 3 A = 5 3 3 rref([a ) = 5 so if we let x 5 = t, then x 4 = t, x 3 =, x = t, and x = t, so that t t x = t = t t whence ker A = span(,,,, ) and a basis

More information

Introduction to Least Squares Adjustment for geodetic VLBI

Introduction to Least Squares Adjustment for geodetic VLBI Introduction to Least Squares Adjustment for geodetic VLBI Matthias Schartner a, David Mayer a a TU Wien, Department of Geodesy and Geoinformation Least Squares Adjustment why? observation is τ (baseline)

More information

Statistical Data Mining and Machine Learning Hilary Term 2016

Statistical Data Mining and Machine Learning Hilary Term 2016 Statistical Data Mining and Machine Learning Hilary Term 2016 Dino Sejdinovic Department of Statistics Oxford Slides and other materials available at: http://www.stats.ox.ac.uk/~sejdinov/sdmml Naïve Bayes

More information

Ma 3/103: Lecture 24 Linear Regression I: Estimation

Ma 3/103: Lecture 24 Linear Regression I: Estimation Ma 3/103: Lecture 24 Linear Regression I: Estimation March 3, 2017 KC Border Linear Regression I March 3, 2017 1 / 32 Regression analysis Regression analysis Estimate and test E(Y X) = f (X). f is the

More information

1 One-way analysis of variance

1 One-way analysis of variance LIST OF FORMULAS (Version from 21. November 2014) STK2120 1 One-way analysis of variance Assume X ij = µ+α i +ɛ ij ; j = 1, 2,..., J i ; i = 1, 2,..., I ; where ɛ ij -s are independent and N(0, σ 2 ) distributed.

More information

Distributed Estimation, Information Loss and Exponential Families. Qiang Liu Department of Computer Science Dartmouth College

Distributed Estimation, Information Loss and Exponential Families. Qiang Liu Department of Computer Science Dartmouth College Distributed Estimation, Information Loss and Exponential Families Qiang Liu Department of Computer Science Dartmouth College Statistical Learning / Estimation Learning generative models from data Topic

More information

Generalized Linear Models. Last time: Background & motivation for moving beyond linear

Generalized Linear Models. Last time: Background & motivation for moving beyond linear Generalized Linear Models Last time: Background & motivation for moving beyond linear regression - non-normal/non-linear cases, binary, categorical data Today s class: 1. Examples of count and ordered

More information

Bayesian Regression Linear and Logistic Regression

Bayesian Regression Linear and Logistic Regression When we want more than point estimates Bayesian Regression Linear and Logistic Regression Nicole Beckage Ordinary Least Squares Regression and Lasso Regression return only point estimates But what if we

More information