Regression and Prediction
|
|
- Doris Bryan
- 5 years ago
- Views:
Transcription
1 -755 Machine Learning for Signal Processing Regression and Prediction Class Oct 202 Instructor: Bhisha Raj 23 Oct /8797
2 Matri Identities f 2... D df df d d df d 2 d2... df dd dd he derivative of a scalar function w.r.t. a vector is a vector he derivative w.r.t. a matri is a matri 23 Oct /8797 2
3 Matri Identities f.. D D D 2D DD df df d df d d df d d df d D d df.. df d d d d22.. d2d df df df dd dd2 d dd dd2 ddd 2 2 D D DD he derivative of a scalar function w.r.t. a vector is a vector he derivative w.r.t. a matri is a matri 23 Oct /8797 3
4 Matri Identities F F F F FN D df d d df df2 df d 2... d dfn.. dfn d d df d d2 df2 d d2.. dfn d d df d dd.. df2.. d.. dd.... dfn d d D D D D he derivative of a vector function w.r.t. a vector is a matri Note transposition of order 23 Oct /8797 4
5 Derivatives, UV, N UV N NUV UVN In general: Differentiating an MN function b a UV argument results in an MNUV tensor derivative 23 Oct /8797 5
6 Matri derivative identities d Xa Xda d a X X da X is a matri, a is a vector. Solution ma also be X d AX da X ; d XA X da A is a matri d d a Xa a X X da A XA XAA AA X trace d trace d trace X X da Some basic linear and quadratic identities 23 Oct /8797 6
7 A Common Problem Can ou spot the glitches? 23 Oct /8797 7
8 How to fi this problem? Glitches in audio Must be detected How? hen what? Glitches must be fied Delete the glitch Results in a hole Fill in the hole How? 23 Oct /8797 8
9 Interpolation.. Etend the curve on the left to predict the values in the blan region Forward prediction Etend the blue curve on the right leftwards to predict the blan region How? Bacward prediction Regression analsis.. 23 Oct /8797 9
10 Detecting the Glitch OK NO OK Regression-based reconstruction can be done anwhere Reconstructed value will not match actual value Large error of reconstruction identifies glitches 23 Oct /8797 0
11 What is a regression Analzing relationship between variables Epressed in man forms Wiipedia Linear regression, Simple regression, Ordinar least squares, Polnomial regression, General linear model, Generalized linear model, Discrete choice, Logistic regression, Multinomial logit, Mied logit, Probit, Multinomial probit,. Generall a tool to predict variables 23 Oct /8797
12 Regressions for prediction = f; Q + e Different possibilities is a scalar Y is real Y is categorical classification is a vector is a vector is a set of real valued variables is a set of categorical variables is a combination of the two f. is a linear or affine function f. is a non-linear function f. is a time-series model 23 Oct /8797 2
13 A linear regression Y Assumption: relationship between variables is linear X A linear trend ma be found relating and = dependent variable = eplanator variable Given, can be predicted as an affine function of 23 Oct /8797 3
14 An imaginar regression.. Chec this shit out Fig.. hat's bonafide, 00%-real data, m friends. I too it mself over the course of two wees. And this was not a leisurel two wees, either; I busted m ass da and night in order to provide ou with nothing but the best data possible. Now, let's loo a bit more closel at this data, remembering that it is absolutel first-rate. Do ou see the eponential dependence? I sure don't. I see a bunch of crap. Christ, this was such a waste of m time. Baning on m hopes that whoever grades this will just loo at the pictures, I drew an eponential through m noise. I believe the apparent legitimac is enhanced b the fact that I used a complicated computer program to mae the fit. I understand this is the same process b which the top quar was discovered. 23 Oct /8797 4
15 Linear Regressions = A + b + e e = prediction error Given a training set of {, } values: estimate A and b = A + b + e 2 = A 2 + b + e 2 3 = A 3 + b+ e 3 If A and b are well estimated, prediction error will be small 23 Oct /8797 5
16 Linear Regression to a scalar = a + b + e 2 = a 2 + b + e 2 3 = a 3 + b + e 3 Define: [ 3...] 2 e [ e e...] 3 2 e X A a b Rewrite A X e 23 Oct /8797 6
17 Learning the parameters A X e ˆ A X Assuming no error Given training data: several, Can define a divergence : D, ŷ Measures how much hat differs from Ideall, if the model is accurate this should be small Estimate A, b to minimize D, ŷ 23 Oct /8797 7
18 he prediction error as divergence Define the divergence as the sum of the squared error in predicting 8 e X A = a + b + e 2 = a 2 + b + e 2 3 = a 3 + b + e 3... ˆ e e e E D, b b b a a a 2 X A X A X A E 23 Oct /8797
19 Prediction error as divergence = a + e e = prediction error Find the slope a such that the total squared length of the error lines is minimized 23 Oct /8797 9
20 Solving a linear regression A X e Minimize squared error E X A 2 A X A X XX A - 2X Differentiating w.r.t A and equating to 0 A 2A XX - 2X A 0 de d A A - X XX pinvx A XX - X 23 Oct /
21 An Aside What happens if we minimize the perpendicular instead? 23 Oct /8797 2
22 Regression in multiple dimensions = A + b + e 2 = A 2 + b + e 2 3 = A 3 + b + e 3 i is a vector ij = j th component of vector i Also called multiple regression Equivalent of saing: Fundamentall no different from N separate single regressions a i = i th column of A b i = i th component of b = a + b + e = A + b + e 2 = a b 2 + e 2 3 = a b 3 + e 3 But we can use the relationship between s to our benefit 23 Oct /
23 Multiple Regression Y [...] 3 2 E [ e e...] 3 2 e X D vector of ones A A b Y A X E DIV i i A i b trace Y A Differentiating and equating to 0 2 X Y A X ddiv 2A XX - 2YX da 0 A - YX XX Ypinv X A XX XY 23 Oct /
24 A Different Perspective = + is a nois reading of A Error e is Gaussian e ~ Estimate A from A e N0, 2 I Y [ 2... N ] X [ 2... N ] 23 Oct /
25 he Lielihood of the data A e e ~ N0, 2 I Probabilit of observing a specific, given, for a particular matri A P ; A N A, 2 I Probabilit of the collection: Y [ 2... N ] X [ 2... N ] P Y X; A i N A Assuming IID for convenience not necessar 23 Oct / i, 2 I
26 A Maimum Lielihood Estimate A e e ~ N0, 2 I Y [ 2... N ] X [ 2... N ] P Y X ep A 2 D 2 i log P Y X; A C i A 2 i i 2 C trace Y A X Y 2 Maimizing the log probabilit is identical to minimizing the trace Identical to the least squares solution i 2 A X 2 A - YX XX Ypinv X A XX XY 23 Oct /
27 Predicting an output From a collection of training data, have learned A Given for a new instance, but not, what is? Simple solution: ˆ A X 23 Oct /
28 Appling it to our problem Prediction b regression Forward regression t = a t- + a 2 t-2 a t- +e t Bacward regression t = b t+ + b 2 t+2 b t+ +e t 23 Oct /
29 Appling it to our problem Forward prediction 23 Oct / K t t K t K t K t t K t t K t t e e e a t e X a t pinv a t X
30 Appling it to our problem Bacward prediction 23 Oct / e e e K t K t K t K t K t t K t t t K t K t b e X b t pinv b t X
31 Finding the burst At each time Learn a forward predictor a t At each time, predict net sample t est = S i a t, t- Compute error: ferr t = t - t est 2 Learn a bacward predict and compute bacward error berr t Compute average prediction error over window, threshold 23 Oct /8797 3
32 Filling the hole Learn forward predictor at left edge of hole For each missing sample At each time, predict net sample t est = S i a t, t- Use estimated samples if real samples are not available Learn bacward predictor at left edge of hole For each missing sample At each time, predict net sample t est = S i b t, t+ Use estimated samples if real samples are not available Average forward and bacward predictions 23 Oct /
33 Reconstruction zoom in Reconstruction area Net glitch Distorted signal Recovered signal Interpolation result Actual data 23 Oct /
34 Incrementall learning the regression A XX - XY Requires nowledge of all, pairs Can we learn A incrementall instead? As data comes in? he Widrow Hoff rule a t a Note the structure t t t ˆ t t ˆ t a t Can also be done in batch mode! error Scalar prediction version 23 Oct /
35 Predicting a value A - XX XY A YX XX ˆ What are we doing eactl? Let ˆ YXˆ ˆ 2 C ˆ i ˆ i C XX Normalizing and rotating space he rotation is irrelevant ˆ i Weighted combination of inputs 23 Oct /
36 Relationships are not alwas linear How do we model these? Multiple solutions 23 Oct /
37 Non-linear regression = j+e φ [ 2... N ] X X [ φ φ 2... φ K ] Y = AX+e Replace X with X in earlier equations for solution A X X - X Y 23 Oct /
38 What we are doing Finding the optimal combination of various function Remind ou of something? 23 Oct /
39 Being non-commital: Local Regression Regression is usuall trained over the entire data Must appl everwhere How about doing this locall? ˆ YXˆ For an ˆ i ˆ i ˆ d, i i i i C ii e i e 23 Oct /
40 Local Regression he resulting regression is dependent on! ˆ d, i i i e d, i i No closed form solution But can be highl accurate But what is d,?? i 2 23 Oct /
41 Kernel Regression Actuall a non-parametric MAP estimator of Note an estimator of, not parameters of regression he Kernel is the ernel of a parzen window But first.. MAP estimators.. 23 Oct / i i h i i i h K K ˆ
42 Map Estimators MAP Maimum A Posteriori: Find a best guess for in a statistical sense, given that we now = argma Y PY ML Maimum Lielihood: Find that value of Y for which the statistical best guess of X would have been the observed X = argma Y P Y MAP is simpler to visualize 23 Oct /
43 MAP estimation: Gaussian PDF Assume X and Y are jointl Gaussian he parameters of the F Gaussian are learned from training data F0 23 Oct /
44 Learning the parameters of the Gaussian z z N N i z i C z N N i z z i z i z z C z C C XX YX C C XY YY 23 Oct /
45 Learning the parameters of the Gaussian N z N z z i Cz zi z zi z N N i z N i C z i C C N N C i XY i i N i XX YX C C XY YY 23 Oct /
46 MAP estimation: Gaussian PDF Assume X and Y are jointl Gaussian he parameters of the F Gaussian are learned from training data F0 23 Oct /
47 MAP Estimator for Gaussian RV Assume X and Y are jointl Gaussian Level set of Gaussian he parameters of the Gaussian are learned from training data Now we are given an X, but no Y What is Y? 23 Oct /
48 MAP estimator for Gaussian RV 0 23 Oct /
49 MAP estimation: Gaussian PDF F 23 Oct /
50 MAP estimation: he Gaussian at a particular value of X F 0 23 Oct /
51 MAP estimation: he Gaussian at a particular value of X Most liel value F 0 23 Oct /8797 5
52 MAP Estimation of a Gaussian RV Y = argma P X??? 23 Oct /
53 MAP Estimation of a Gaussian RV 23 Oct /
54 MAP Estimation of a Gaussian RV 23 Oct /
55 So what is this value? Clearl a line Equation of Line: ˆ Y C YX C XX Scalar version given; vector version is identical ˆ C Y YX C XX Derivation? Later in the program a bit 23 Oct /
56 his is a multiple regression ˆ Y C YX C XX his is the MAP estimate of NO the regression parameter What about the ML estimate of Again, ML estimate of, not regression parameter 23 Oct /
57 Its also a minimum-mean-squared error estimate General principle of MMSE estimation: is unnown, is nown Must estimate it such that the epected squared error is minimized Err E[ ˆ 2 ] Minimize above term 23 Oct /
58 Its also a minimum-mean-squared error estimate Minimize error: Differentiating and equating to 0: 23 Oct / ] ˆ ˆ [ ] ˆ [ 2 E E Err ] [ 2ˆ ˆ ˆ ] [ ] 2ˆ ˆ ˆ [ E E E Err 0 ˆ ] [ 2 ˆ 2ˆ ] 2ˆ ˆ ˆ [ 2 d E d E derr ] [ ˆ E he MMSE estimate is the mean of the distribution
59 For the Gaussian: MAP = MMSE Most liel value is also he MEAN value Would be true of an smmetric distribution 23 Oct /
60 MMSE estimates for miture distributions 60 Let P X be a miture densit he MMSE estimate of is given b Just a weighted combination of the MMSE estimates from the component distributions, P P P d P P E, ] [ d P P, ], [ E P 23 Oct /8797
61 MMSE estimates from a Gaussian miture 23 Oct / Let P is also a Gaussian miture Let P, be a Gaussian Miture, ; N P P P S z z, z,,,, P P P P P P P P P P P P,
62 MMSE estimates from a Gaussian miture 23 Oct / Let P is a Gaussian Miture P P P, ], ; ];[ ; [,,,,,,,, C C C C N P, ;,,,,, Q C C N P Q C C N P P, ;,,,,
63 MMSE estimates from a Gaussian miture 23 Oct / E[ ] is also a miture P[ ] is a miture densit Q C C N P P, ;,,,, E P E ], [ ] [ C C P E ] [,,,,
64 MMSE estimates from a Gaussian miture A miture of estimates from individual Gaussians 23 Oct /
65 MMSE with GMM: Voice ransformation - Festvo GMM transformation suite oda awb bdl jm slt awb bdl jm slt 23 Oct /
66 Voice Morphing Align training recordings from both speaers Cepstral vector sequence Learn a GMM on joint vectors Given speech from one speaer, find MMSE estimate of the other Snthesize from cepstra 23 Oct /
67 A problem with regressions A XX - XY ML fit is sensitive Error is squared Small variations in data large variations in weights Outliers affect it adversel Unstable If dimension of X >= no. of instances XX is not invertible 23 Oct /
68 MAP estimation of weights a =a X+e Assume weights drawn from a Gaussian Pa = N0, 2 I Ma. Lielihood estimate Maimum a posteriori estimate aˆ arg ma a X aˆ arg ma a log P X; a log P a, X arg ma A log P e X, a P a 23 Oct /
69 MAP estimation of weights Similar to ML estimate with an additional term 23 Oct / , log arg ma, log arg ma ˆ a X a X a a A A P P P Pa = N0, 2 I Log Pa = C log a 2 C P 2 log 2 X a X a X, a a a X a X a a A C log ' arg ma ˆ
70 MAP estimate of weights dl 2a XX 2X 2I da 0 a XX I - XY Equivalent to diagonal loading of correlation matri Improves condition number of correlation matri Can be inverted with greater stabilit Will not affect the estimation from well-conditioned data Also called ihonov Regularization Dual form: Ridge regression MAP estimate of weights Not to be confused with MAP estimate of Y 23 Oct /
71 MAP estimate priors Left: Gaussian Prior on W Right: Laplacian Prior 23 Oct /8797 7
72 MAP estimation of weights with laplacian prior Assume weights drawn from a Laplacian Pa = l - ep-l - a Maimum a posteriori estimate aˆ arg ma A C' a X a X l a No closed form solution Quadratic programming solution required Non-trivial 23 Oct /
73 MAP estimation of weights with laplacian prior Assume weights drawn from a Laplacian Pa = l - ep-l - a Maimum a posteriori estimate aˆ arg ma A C' a X a X l a Identical to L regularized least-squares estimation 23 Oct /
74 L-regularized LSE No closed form solution aˆ arg ma A C' a X a X l Quadratic programming solutions required a Dual formulation aˆ arg ma A C ' a X a X subject to a t LASSO Least absolute shrinage and selection operator 23 Oct /
75 LASSO Algorithms Various conve optimization algorithms LARS: Least angle regression Pathwise coordinate descent.. Matlab code available from web 23 Oct /
76 Regularized least squares Image Credit: ibshirani Regularization results in selection of suboptimal in leastsquares sense solution One of the loci outside center ihonov regularization selects shortest solution L regularization selects sparsest solution 23 Oct /
77 LASSO and Compressive Sensing Y = X a Given Y and X, estimate sparse W LASSO: CS: X = eplanator variable Y = dependent variable a = weights of regression X = measurement matri Y = measurement a = data 23 Oct /
78 An interesting problem: Predicting War! Economists measure a number of social indicators for countries weel Happiness inde Hunger inde Freedom inde witter records Question: Will there be a revolution or war net wee? 23 Oct /
79 An interesting problem: Predicting War! Issues: Dissatisfaction builds up not an instantaneous phenomenon Usuall War / rebellion build up much faster Often in hours Important to predict Preparedness for securit Economic impact 23 Oct /
80 Predicting War O O 2 O 3 O 4 O 5 O 6 O 7 O 8 W W W W W W W W S S S S S S S S Given Sequence of economic indicators for each wee Sequence of unrest marers for each wee At the end of each wee we now if war happened or not that wee Predict probabilit of unrest net wee w w2 w3 w4 w5w6 w7w8 his could be a new unrest or persistence of a current one 23 Oct /
81 A Step Aside: Predicting ime Series An HMM is a model for time-series data How can we use it predict the future? 23 Oct /8797 8
82 Predicting with an HMM Given Observations O..O t All HMM parameters Learned from some training data Must estimate future observation O t+ Estimate must consider entire histor O..O t No nowledge of actual state of the process at an time 23 Oct /
83 Predicting with an HMM a s, t P O, O2,..., Ot, state t s as,t s t t+ time Given O..O t Compute PO.. O t,s Using the forward algorithm computes as,t P s t s O.. t s' P st s, O.. t P s s', O t 23 Oct / t a s, t a s', t s'
84 Predicting with an HMM s as,t+ Given Ps t =s O..t for all s Ps t+ = s O..t = S s Ps t =s O..t Ps s PO t+,s O..t = PO s Ps t+ =s O..t PO t+ O..t = S s PO t+,s O..t = S s PO s Ps t+ =s O..t his is a miture distribution 23 Oct /8797 time 84
85 Predicting with an HMM s as,t+ PO t+ O..t = S s PO t+,s O..t = S s PO s Ps t+ =s O.. time MMSE estimate of O t+ given O..t E[O t+ O..t ] = S s Ps t+ =s O.. E[O s] A weighted sum of the state means 23 Oct /
86 Predicting with an HMM MMSE Estimate of O t+ = E[O t+ O.. ] E[O t+ O..t ] = S s Ps t+ =s O.. E[O s] If PO s is a GMM EO s = S P s,s 23 Oct / s s s t t w O s P O,,.. ˆ s s s s t w s t s t O,, ' ',, ˆ a a
87 Predicting War rain an HMM on z = [w, s] After the t th wee, predict probabilit distribution: Pz t z z t = Pw, z z..z t Marginalize out not nown for net wee War? E[w z..z t ] P w z.. t P w, s z.. t ds 23 Oct /
Machine Learning for Signal Processing Bayes Classification
Machine Learning for Signal Processing Bayes Classification Class 16. 24 Oct 2017 Instructor: Bhiksha Raj - Abelino Jimenez 11755/18797 1 Recap: KNN A very effective and simple way of performing classification
More informationKernel PCA, clustering and canonical correlation analysis
ernel PCA, clustering and canonical correlation analsis Le Song Machine Learning II: Advanced opics CSE 8803ML, Spring 2012 Support Vector Machines (SVM) 1 min w 2 w w + C j ξ j s. t. w j + b j 1 ξ j,
More informationLecture 3: Pattern Classification. Pattern classification
EE E68: Speech & Audio Processing & Recognition Lecture 3: Pattern Classification 3 4 5 The problem of classification Linear and nonlinear classifiers Probabilistic classification Gaussians, mitures and
More informationMachine Learning 4771
Machine Learning 4771 Instructor: Tony Jebara Topic 7 Unsupervised Learning Statistical Perspective Probability Models Discrete & Continuous: Gaussian, Bernoulli, Multinomial Maimum Likelihood Logistic
More informationMachine Learning for Signal Processing Regression and Prediction
Machne Learnng for Sgnal Processng Regresson and Predcton Class 4. 7 Oct 0 Instructor: Bhsha Raj 7 Oct 03 755/8797 f... D df Matr Identtes df d d df d d... df dd dd he dervatve of a scalar functon w.r.t.
More informationWhere now? Machine Learning and Bayesian Inference
Machine Learning and Bayesian Inference Dr Sean Holden Computer Laboratory, Room FC6 Telephone etension 67 Email: sbh@clcamacuk wwwclcamacuk/ sbh/ Where now? There are some simple take-home messages from
More information6. Vector Random Variables
6. Vector Random Variables In the previous chapter we presented methods for dealing with two random variables. In this chapter we etend these methods to the case of n random variables in the following
More informationCSE 546 Midterm Exam, Fall 2014
CSE 546 Midterm Eam, Fall 2014 1. Personal info: Name: UW NetID: Student ID: 2. There should be 14 numbered pages in this eam (including this cover sheet). 3. You can use an material ou brought: an book,
More informationLECTURE # - NEURAL COMPUTATION, Feb 04, Linear Regression. x 1 θ 1 output... θ M x M. Assumes a functional form
LECTURE # - EURAL COPUTATIO, Feb 4, 4 Linear Regression Assumes a functional form f (, θ) = θ θ θ K θ (Eq) where = (,, ) are the attributes and θ = (θ, θ, θ ) are the function parameters Eample: f (, θ)
More informationBayesian Approach 2. CSC412 Probabilistic Learning & Reasoning
CSC412 Probabilistic Learning & Reasoning Lecture 12: Bayesian Parameter Estimation February 27, 2006 Sam Roweis Bayesian Approach 2 The Bayesian programme (after Rev. Thomas Bayes) treats all unnown quantities
More informationMachine Learning Basics III
Machine Learning Basics III Benjamin Roth CIS LMU München Benjamin Roth (CIS LMU München) Machine Learning Basics III 1 / 62 Outline 1 Classification Logistic Regression 2 Gradient Based Optimization Gradient
More informationLinear regression Class 25, Jeremy Orloff and Jonathan Bloom
1 Learning Goals Linear regression Class 25, 18.05 Jerem Orloff and Jonathan Bloom 1. Be able to use the method of least squares to fit a line to bivariate data. 2. Be able to give a formula for the total
More informationMachine Learning for Signal Processing Regression and Prediction
Machne Learnng for Sgnal Processng Regresson and Predcton Class 4. 5 Oct 06 Instructor: Bhsha Raj 755/8797 A Common Problem Can ou spot the gltches? 755/8797 How to f ths problem? Gltches n audo Must be
More information6.867 Machine learning
6.867 Machine learning Mid-term eam October 8, 6 ( points) Your name and MIT ID: .5.5 y.5 y.5 a).5.5 b).5.5.5.5 y.5 y.5 c).5.5 d).5.5 Figure : Plots of linear regression results with different types of
More informationLecture 4: Probabilistic Learning. Estimation Theory. Classification with Probability Distributions
DD2431 Autumn, 2014 1 2 3 Classification with Probability Distributions Estimation Theory Classification in the last lecture we assumed we new: P(y) Prior P(x y) Lielihood x2 x features y {ω 1,..., ω K
More information2 Statistical Estimation: Basic Concepts
Technion Israel Institute of Technology, Department of Electrical Engineering Estimation and Identification in Dynamical Systems (048825) Lecture Notes, Fall 2009, Prof. N. Shimkin 2 Statistical Estimation:
More informationLinear Models for Regression
Linear Models for Regression Machine Learning Torsten Möller Möller/Mori 1 Reading Chapter 3 of Pattern Recognition and Machine Learning by Bishop Chapter 3+5+6+7 of The Elements of Statistical Learning
More informationOverfitting, Bias / Variance Analysis
Overfitting, Bias / Variance Analysis Professor Ameet Talwalkar Professor Ameet Talwalkar CS260 Machine Learning Algorithms February 8, 207 / 40 Outline Administration 2 Review of last lecture 3 Basic
More informationHTF: Ch4 B: Ch4. Linear Classifiers. R Greiner Cmput 466/551
HTF: Ch4 B: Ch4 Linear Classifiers R Greiner Cmput 466/55 Outline Framework Eact Minimize Mistakes Perceptron Training Matri inversion LMS Logistic Regression Ma Likelihood Estimation MLE of P Gradient
More informationUnit 3 NOTES Honors Common Core Math 2 1. Day 1: Properties of Exponents
Unit NOTES Honors Common Core Math Da : Properties of Eponents Warm-Up: Before we begin toda s lesson, how much do ou remember about eponents? Use epanded form to write the rules for the eponents. OBJECTIVE
More informationUniversität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen. Bayesian Learning. Tobias Scheffer, Niels Landwehr
Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen Bayesian Learning Tobias Scheffer, Niels Landwehr Remember: Normal Distribution Distribution over x. Density function with parameters
More informationMachine Learning. Gaussian Mixture Models. Zhiyao Duan & Bryan Pardo, Machine Learning: EECS 349 Fall
Machine Learning Gaussian Mixture Models Zhiyao Duan & Bryan Pardo, Machine Learning: EECS 349 Fall 2012 1 The Generative Model POV We think of the data as being generated from some process. We assume
More informationLinear Models for Regression CS534
Linear Models for Regression CS534 Example Regression Problems Predict housing price based on House size, lot size, Location, # of rooms Predict stock price based on Price history of the past month Predict
More informationMACHINE LEARNING ADVANCED MACHINE LEARNING
MACHINE LEARNING ADVANCED MACHINE LEARNING Recap of Important Notions on Estimation of Probability Density Functions 22 MACHINE LEARNING Discrete Probabilities Consider two variables and y taking discrete
More informationCOMS 4771 Introduction to Machine Learning. James McInerney Adapted from slides by Nakul Verma
COMS 4771 Introduction to Machine Learning James McInerney Adapted from slides by Nakul Verma Announcements HW1: Please submit as a group Watch out for zero variance features (Q5) HW2 will be released
More informationLeast Squares. Ken Kreutz-Delgado (Nuno Vasconcelos) ECE 175A Winter UCSD
Least Squares Ken Kreutz-Delgado (Nuno Vasconcelos) ECE 75A Winter 0 - UCSD (Unweighted) Least Squares Assume linearity in the unnown, deterministic model parameters Scalar, additive noise model: y f (
More informationStatistics for scientists and engineers
Statistics for scientists and engineers February 0, 006 Contents Introduction. Motivation - why study statistics?................................... Examples..................................................3
More informationComputer Vision Group Prof. Daniel Cremers. 2. Regression (cont.)
Prof. Daniel Cremers 2. Regression (cont.) Regression with MLE (Rep.) Assume that y is affected by Gaussian noise : t = f(x, w)+ where Thus, we have p(t x, w, )=N (t; f(x, w), 2 ) 2 Maximum A-Posteriori
More informationMathematical Notation Math Introduction to Applied Statistics
Mathematical Notation Math 113 - Introduction to Applied Statistics Name : Use Word or WordPerfect to recreate the following documents. Each article is worth 10 points and can be printed and given to the
More informationVector Calculus Review
Course Instructor Dr. Ramond C. Rumpf Office: A-337 Phone: (915) 747-6958 E-Mail: rcrumpf@utep.edu Vector Calculus Review EE3321 Electromagnetic Field Theor Outline Mathematical Preliminaries Phasors,
More informationCOMS 4771 Regression. Nakul Verma
COMS 4771 Regression Nakul Verma Last time Support Vector Machines Maximum Margin formulation Constrained Optimization Lagrange Duality Theory Convex Optimization SVM dual and Interpretation How get the
More informationCSE 559A: Computer Vision
CSE 559A: Computer Vision Fall 208: T-R: :30-pm @ Lopata 0 Instructor: Ayan Chakrabarti (ayan@wustl.edu). Course Staff: Zhihao ia, Charlie Wu, Han Liu http://www.cse.wustl.edu/~ayan/courses/cse559a/ Sep
More informationFinal Overview. Introduction to ML. Marek Petrik 4/25/2017
Final Overview Introduction to ML Marek Petrik 4/25/2017 This Course: Introduction to Machine Learning Build a foundation for practice and research in ML Basic machine learning concepts: max likelihood,
More informationMachine Learning for Signal Processing Bayes Classification and Regression
Machine Learning for Signal Processing Bayes Classification and Regression Instructor: Bhiksha Raj 11755/18797 1 Recap: KNN A very effective and simple way of performing classification Simple model: For
More informationMachine Learning Lecture Notes
Machine Learning Lecture Notes Predrag Radivojac January 25, 205 Basic Principles of Parameter Estimation In probabilistic modeling, we are typically presented with a set of observations and the objective
More informationOverview. IAML: Linear Regression. Examples of regression problems. The Regression Problem
3 / 38 Overview 4 / 38 IAML: Linear Regression Nigel Goddard School of Informatics Semester The linear model Fitting the linear model to data Probabilistic interpretation of the error function Eamples
More informationBiostatistics Advanced Methods in Biostatistics IV
Biostatistics 140.754 Advanced Methods in Biostatistics IV Jeffrey Leek Assistant Professor Department of Biostatistics jleek@jhsph.edu Lecture 12 1 / 36 Tip + Paper Tip: As a statistician the results
More informationMAT 127: Calculus C, Fall 2010 Solutions to Midterm I
MAT 7: Calculus C, Fall 00 Solutions to Midterm I Problem (0pts) Consider the four differential equations for = (): (a) = ( + ) (b) = ( + ) (c) = e + (d) = e. Each of the four diagrams below shows a solution
More informationHidden Markov Models
Hidden Markov Models CI/CI(CS) UE, SS 2015 Christian Knoll Signal Processing and Speech Communication Laboratory Graz University of Technology June 23, 2015 CI/CI(CS) SS 2015 June 23, 2015 Slide 1/26 Content
More informationEE613 Machine Learning for Engineers. Kernel methods Support Vector Machines. jean-marc odobez 2015
EE613 Machine Learning for Engineers Kernel methods Support Vector Machines jean-marc odobez 2015 overview Kernel methods introductions and main elements defining kernels Kernelization of k-nn, K-Means,
More informationSquare Estimation by Matrix Inverse Lemma 1
Proof of Two Conclusions Associated Linear Minimum Mean Square Estimation b Matri Inverse Lemma 1 Jianping Zheng State e Lab of IS, Xidian Universit, Xi an, 7171, P. R. China jpzheng@idian.edu.cn Ma 6,
More informationLinear Models for Regression CS534
Linear Models for Regression CS534 Example Regression Problems Predict housing price based on House size, lot size, Location, # of rooms Predict stock price based on Price history of the past month Predict
More informationLeast Squares Regression
CIS 50: Machine Learning Spring 08: Lecture 4 Least Squares Regression Lecturer: Shivani Agarwal Disclaimer: These notes are designed to be a supplement to the lecture. They may or may not cover all the
More informationRecursive Estimation
Recursive Estimation Raffaello D Andrea Spring 07 Problem Set 3: Etracting Estimates from Probability Distributions Last updated: April 5, 07 Notes: Notation: Unlessotherwisenoted,, y,andz denoterandomvariables,
More informationINF Introduction to classifiction Anne Solberg Based on Chapter 2 ( ) in Duda and Hart: Pattern Classification
INF 4300 151014 Introduction to classifiction Anne Solberg anne@ifiuiono Based on Chapter 1-6 in Duda and Hart: Pattern Classification 151014 INF 4300 1 Introduction to classification One of the most challenging
More informationProbability Theory Refresher
Machine Learning WS24 Module IN264 Sheet 2 Page Machine Learning Worksheet 2 Probabilit Theor Refresher Basic Probabilit Problem : A secret government agenc has developed a scanner which determines whether
More information4 Bias-Variance for Ridge Regression (24 points)
Implement Ridge Regression with λ = 0.00001. Plot the Squared Euclidean test error for the following values of k (the dimensions you reduce to): k = {0, 50, 100, 150, 200, 250, 300, 350, 400, 450, 500,
More informationMachine Learning Linear Classification. Prof. Matteo Matteucci
Machine Learning Linear Classification Prof. Matteo Matteucci Recall from the first lecture 2 X R p Regression Y R Continuous Output X R p Y {Ω 0, Ω 1,, Ω K } Classification Discrete Output X R p Y (X)
More informationSpeech and Language Processing
Speech and Language Processing Lecture 5 Neural network based acoustic and language models Information and Communications Engineering Course Takahiro Shinoaki 08//6 Lecture Plan (Shinoaki s part) I gives
More informationReview of Probability
Review of robabilit robabilit Theor: Man techniques in speech processing require the manipulation of probabilities and statistics. The two principal application areas we will encounter are: Statistical
More informationLanguage and Statistics II
Language and Statistics II Lecture 19: EM for Models of Structure Noah Smith Epectation-Maimization E step: i,, q i # p r $ t = p r i % ' $ t i, p r $ t i,' soft assignment or voting M step: r t +1 # argma
More informationSemi-Supervised Laplacian Regularization of Kernel Canonical Correlation Analysis
Semi-Supervised Laplacian Regularization of Kernel Canonical Correlation Analsis Matthew B. Blaschko, Christoph H. Lampert, & Arthur Gretton Ma Planck Institute for Biological Cbernetics Tübingen, German
More informationMATH 2300 review problems for Exam 3 ANSWERS
MATH 300 review problems for Eam 3 ANSWERS. Check whether the following series converge or diverge. In each case, justif our answer b either computing the sum or b b showing which convergence test ou are
More informationLecture Note 2: Estimation and Information Theory
Univ. of Michigan - NAME 568/EECS 568/ROB 530 Winter 2018 Lecture Note 2: Estimation and Information Theory Lecturer: Maani Ghaffari Jadidi Date: April 6, 2018 2.1 Estimation A static estimation problem
More informationGaussian Mixture Models, Expectation Maximization
Gaussian Mixture Models, Expectation Maximization Instructor: Jessica Wu Harvey Mudd College The instructor gratefully acknowledges Andrew Ng (Stanford), Andrew Moore (CMU), Eric Eaton (UPenn), David Kauchak
More informationLinear Methods for Regression. Lijun Zhang
Linear Methods for Regression Lijun Zhang zlj@nju.edu.cn http://cs.nju.edu.cn/zlj Outline Introduction Linear Regression Models and Least Squares Subset Selection Shrinkage Methods Methods Using Derived
More informationCSE 559A: Computer Vision Tomorrow Zhihao's Office Hours back in Jolley 309: 10:30am-Noon
CSE 559A: Computer Vision ADMINISTRIVIA Tomorrow Zhihao's Office Hours back in Jolley 309: 0:30am-Noon Fall 08: T-R: :30-pm @ Lopata 0 This Friday: Regular Office Hours Net Friday: Recitation for PSET
More informationCopyright, 2008, R.E. Kass, E.N. Brown, and U. Eden REPRODUCTION OR CIRCULATION REQUIRES PERMISSION OF THE AUTHORS
Copright, 8, RE Kass, EN Brown, and U Eden REPRODUCTION OR CIRCULATION REQUIRES PERMISSION OF THE AUTHORS Chapter 6 Random Vectors and Multivariate Distributions 6 Random Vectors In Section?? we etended
More informationIntroduction to Machine Learning Spring 2018 Note Duality. 1.1 Primal and Dual Problem
CS 189 Introduction to Machine Learning Spring 2018 Note 22 1 Duality As we have seen in our discussion of kernels, ridge regression can be viewed in two ways: (1) an optimization problem over the weights
More informationLecture 3. Linear Regression II Bastian Leibe RWTH Aachen
Advanced Machine Learning Lecture 3 Linear Regression II 02.11.2015 Bastian Leibe RWTH Aachen http://www.vision.rwth-aachen.de/ leibe@vision.rwth-aachen.de This Lecture: Advanced Machine Learning Regression
More informationLecture 3: Latent Variables Models and Learning with the EM Algorithm. Sam Roweis. Tuesday July25, 2006 Machine Learning Summer School, Taiwan
Lecture 3: Latent Variables Models and Learning with the EM Algorithm Sam Roweis Tuesday July25, 2006 Machine Learning Summer School, Taiwan Latent Variable Models What to do when a variable z is always
More informationChapter 11. Correlation and Regression
Chapter 11 Correlation and Regression Correlation A relationship between two variables. The data can be represented b ordered pairs (, ) is the independent (or eplanator) variable is the dependent (or
More informationMACHINE LEARNING ADVANCED MACHINE LEARNING
MACHINE LEARNING ADVANCED MACHINE LEARNING Recap of Important Notions on Estimation of Probability Density Functions 2 2 MACHINE LEARNING Overview Definition pdf Definition joint, condition, marginal,
More informationDependence and scatter-plots. MVE-495: Lecture 4 Correlation and Regression
Dependence and scatter-plots MVE-495: Lecture 4 Correlation and Regression It is common for two or more quantitative variables to be measured on the same individuals. Then it is useful to consider what
More informationMachine Learning (CS 567) Lecture 5
Machine Learning (CS 567) Lecture 5 Time: T-Th 5:00pm - 6:20pm Location: GFS 118 Instructor: Sofus A. Macskassy (macskass@usc.edu) Office: SAL 216 Office hours: by appointment Teaching assistant: Cheol
More informationSolutions to Two Interesting Problems
Solutions to Two Interesting Problems Save the Lemming On each square of an n n chessboard is an arrow pointing to one of its eight neighbors (or off the board, if it s an edge square). However, arrows
More informationLecture 3: Pattern Classification
EE E6820: Speech & Audio Processing & Recognition Lecture 3: Pattern Classification 1 2 3 4 5 The problem of classification Linear and nonlinear classifiers Probabilistic classification Gaussians, mixtures
More informationCS242: Probabilistic Graphical Models Lecture 4A: MAP Estimation & Graph Structure Learning
CS242: Probabilistic Graphical Models Lecture 4A: MAP Estimation & Graph Structure Learning Professor Erik Sudderth Brown University Computer Science October 4, 2016 Some figures and materials courtesy
More informationLecture 5: Linear models for classification. Logistic regression. Gradient Descent. Second-order methods.
Lecture 5: Linear models for classification. Logistic regression. Gradient Descent. Second-order methods. Linear models for classification Logistic regression Gradient descent and second-order methods
More informationLecture 2 Machine Learning Review
Lecture 2 Machine Learning Review CMSC 35246: Deep Learning Shubhendu Trivedi & Risi Kondor University of Chicago March 29, 2017 Things we will look at today Formal Setup for Supervised Learning Things
More informationUNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2013
UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2013 Exam policy: This exam allows two one-page, two-sided cheat sheets; No other materials. Time: 2 hours. Be sure to write your name and
More informationECE521 lecture 4: 19 January Optimization, MLE, regularization
ECE521 lecture 4: 19 January 2017 Optimization, MLE, regularization First four lectures Lectures 1 and 2: Intro to ML Probability review Types of loss functions and algorithms Lecture 3: KNN Convexity
More informationDecision-Point Signal to Noise Ratio (SNR)
Decision-Point Signal to Noise Ratio (SNR) Receiver Decision ^ SNR E E e y z Matched Filter Bound error signal at input to decision device Performance upper-bound on ISI channels Achieved on memoryless
More informationMachine Learning and Data Mining. Linear regression. Kalev Kask
Machine Learning and Data Mining Linear regression Kalev Kask Supervised learning Notation Features x Targets y Predictions ŷ Parameters q Learning algorithm Program ( Learner ) Change q Improve performance
More information8. BOOLEAN ALGEBRAS x x
8. BOOLEAN ALGEBRAS 8.1. Definition of a Boolean Algebra There are man sstems of interest to computing scientists that have a common underling structure. It makes sense to describe such a mathematical
More informationMobile Robot Localization
Mobile Robot Localization 1 The Problem of Robot Localization Given a map of the environment, how can a robot determine its pose (planar coordinates + orientation)? Two sources of uncertainty: - observations
More informationLinear Models for Regression CS534
Linear Models for Regression CS534 Prediction Problems Predict housing price based on House size, lot size, Location, # of rooms Predict stock price based on Price history of the past month Predict the
More informationDay 4: Classification, support vector machines
Day 4: Classification, support vector machines Introduction to Machine Learning Summer School June 18, 2018 - June 29, 2018, Chicago Instructor: Suriya Gunasekar, TTI Chicago 21 June 2018 Topics so far
More information6.867 Machine Learning
6.867 Machine Learning Problem set 1 Solutions Thursday, September 19 What and how to turn in? Turn in short written answers to the questions explicitly stated, and when requested to explain or prove.
More informationCS Homework 3. October 15, 2009
CS 294 - Homework 3 October 15, 2009 If you have questions, contact Alexandre Bouchard (bouchard@cs.berkeley.edu) for part 1 and Alex Simma (asimma@eecs.berkeley.edu) for part 2. Also check the class website
More informationNotes on Discriminant Functions and Optimal Classification
Notes on Discriminant Functions and Optimal Classification Padhraic Smyth, Department of Computer Science University of California, Irvine c 2017 1 Discriminant Functions Consider a classification problem
More information6.869 Advances in Computer Vision. Prof. Bill Freeman March 1, 2005
6.869 Advances in Computer Vision Prof. Bill Freeman March 1 2005 1 2 Local Features Matching points across images important for: object identification instance recognition object class recognition pose
More informationNaïve Bayes Introduction to Machine Learning. Matt Gormley Lecture 3 September 14, Readings: Mitchell Ch Murphy Ch.
School of Computer Science 10-701 Introduction to Machine Learning aïve Bayes Readings: Mitchell Ch. 6.1 6.10 Murphy Ch. 3 Matt Gormley Lecture 3 September 14, 2016 1 Homewor 1: due 9/26/16 Project Proposal:
More informationConvex Optimization M2
Convex Optimization M2 Lecture 8 A. d Aspremont. Convex Optimization M2. 1/57 Applications A. d Aspremont. Convex Optimization M2. 2/57 Outline Geometrical problems Approximation problems Combinatorial
More informationChapter 3. Expectations
pectations - Chapter. pectations.. Introduction In this chapter we define five mathematical epectations the mean variance standard deviation covariance and correlation coefficient. We appl these general
More informationMark your answers ON THE EXAM ITSELF. If you are not sure of your answer you may wish to provide a brief explanation.
CS 189 Spring 2015 Introduction to Machine Learning Midterm You have 80 minutes for the exam. The exam is closed book, closed notes except your one-page crib sheet. No calculators or electronic items.
More informationReferences. Lecture 3: Shrinkage. The Power of Amnesia. Ockham s Razor
References Lecture 3: Shrinkage Isabelle Guon guoni@inf.ethz.ch Structural risk minimization for character recognition Isabelle Guon et al. http://clopinet.com/isabelle/papers/sr m.ps.z Kernel Ridge Regression
More informationINF Introduction to classifiction Anne Solberg
INF 4300 8.09.17 Introduction to classifiction Anne Solberg anne@ifi.uio.no Introduction to classification Based on handout from Pattern Recognition b Theodoridis, available after the lecture INF 4300
More information0.24 adults 2. (c) Prove that, regardless of the possible values of and, the covariance between X and Y is equal to zero. Show all work.
1 A socioeconomic stud analzes two discrete random variables in a certain population of households = number of adult residents and = number of child residents It is found that their joint probabilit mass
More informationBasic concepts in estimation
Basic concepts in estimation Random and nonrandom parameters Definitions of estimates ML Maimum Lielihood MAP Maimum A Posteriori LS Least Squares MMS Minimum Mean square rror Measures of quality of estimates
More information1 History of statistical/machine learning. 2 Supervised learning. 3 Two approaches to supervised learning. 4 The general learning procedure
Overview Breiman L (2001). Statistical modeling: The two cultures Statistical learning reading group Aleander L 1 Histor of statistical/machine learning 2 Supervised learning 3 Two approaches to supervised
More informationECE521 week 3: 23/26 January 2017
ECE521 week 3: 23/26 January 2017 Outline Probabilistic interpretation of linear regression - Maximum likelihood estimation (MLE) - Maximum a posteriori (MAP) estimation Bias-variance trade-off Linear
More informationApproximate Message Passing with Built-in Parameter Estimation for Sparse Signal Recovery
Approimate Message Passing with Built-in Parameter Estimation for Sparse Signal Recovery arxiv:1606.00901v1 [cs.it] Jun 016 Shuai Huang, Trac D. Tran Department of Electrical and Computer Engineering Johns
More informationLinear Models for Regression. Sargur Srihari
Linear Models for Regression Sargur srihari@cedar.buffalo.edu 1 Topics in Linear Regression What is regression? Polynomial Curve Fitting with Scalar input Linear Basis Function Models Maximum Likelihood
More informationGWAS V: Gaussian processes
GWAS V: Gaussian processes Dr. Oliver Stegle Christoh Lippert Prof. Dr. Karsten Borgwardt Max-Planck-Institutes Tübingen, Germany Tübingen Summer 2011 Oliver Stegle GWAS V: Gaussian processes Summer 2011
More informationV-0. Review of Probability
V-0. Review of Probabilit Random Variable! Definition Numerical characterization of outcome of a random event!eamles Number on a rolled die or dice Temerature at secified time of da 3 Stock Market at close
More informationSTA 4273H: Statistical Machine Learning
STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 3 Linear
More informationClustering K-means. Machine Learning CSE546. Sham Kakade University of Washington. November 15, Review: PCA Start: unsupervised learning
Clustering K-means Machine Learning CSE546 Sham Kakade University of Washington November 15, 2016 1 Announcements: Project Milestones due date passed. HW3 due on Monday It ll be collaborative HW2 grades
More informationNST1A: Mathematics II (Course A) End of Course Summary, Lent 2011
General notes Proofs NT1A: Mathematics II (Course A) End of Course ummar, Lent 011 tuart Dalziel (011) s.dalziel@damtp.cam.ac.uk This course is not about proofs, but rather about using different techniques.
More information6.867 Machine Learning
6.867 Machine Learning Problem set 1 Due Thursday, September 19, in class What and how to turn in? Turn in short written answers to the questions explicitly stated, and when requested to explain or prove.
More information