CSE 559A: Computer Vision Tomorrow Zhihao's Office Hours back in Jolley 309: 10:30am-Noon
|
|
- Steven Ferguson
- 5 years ago
- Views:
Transcription
1 CSE 559A: Computer Vision ADMINISTRIVIA Tomorrow Zhihao's Office Hours back in Jolley 309: 0:30am-Noon Fall 08: T-R: Lopata 0 This Friday: Regular Office Hours Net Friday: Recitation for PSET Try all problems before coming to recitation Monday Office Hours again in Jolley 7 Instructor: Ayan Chakrabarti (ayan@wustl.edu. Course Staff: Zhihao ia, Charlie Wu, Han Liu Sep 3, 08 Setting There is a true image (or image-like object What we have is some degraded observation that we want to estimate that is based on The degradation might be "stochastic": so we say p(y Simplest Case (single piel: : Estimate from y: "denoising", y R y y = + σϵ p(y = (y; μ =, p(y is our observation model, called the likelihood function Likelihood of observation given true image Maimum Likelihood Estimator: = arg ma p(y For p(y = (y;,, what is ML Estimate? = y That's a little disappointing. y p(y = ep ( π σ (y
2 0 What we really want to do is maimize p( y, not the likelihood p(y Bayes Rule p(y is the distribution of observation y given true image p( y is the distribution of true given observation y Simple Derivation p( y = p(y p( p(y p(y p( : Likelihood : "Prior" Distribution on p( y = p(y p( p(y p( d What we believe about before we see the observation p( y : Posterior Distribution of given observation y What we believe about a er we see the observation p(, y = p(p(y = p(yp( y p( y = p(y p( p(y = p(y p( d p(y p( y = p(y p( p(y p( d p( y = p(y p( p(y p( d Maimum A Posteriori (MAP Estimation Maimum A Posteriori (MAP Estimation The most likely answer under the posterior distribution Other Estimators Possible: Try to minimize some "risk" or loss function Measures how bad an answer Minimize Epected Risk Under Posterior Let's say L(, = (. What is? = arg ma p( y is when true value is = arg min L(, p( yd L(, How do we compute this? = arg ma p( y = arg ma p( y = arg ma p(y p( p(y p( d = arg ma p(y p( = arg ma log[p(y p(] = arg ma log p(y + log p( = arg min log p(y log p( p( y = = p( yd (Turn this into minimization of a cost
3 Maimum A Posteriori (MAP Estimation Maimum A Posteriori (MAP Estimation Back to Denoising (y ( 0.5 = arg min + C + + C How do you compute? = arg min + (y ( 0.5 (y p(y = ep ( πσ Let's choose a simple prior: p( = (; 0.5, In this case, simpler option available. Complete the squares, epress as a single quadratic. More generally, to minimize C(, find C ( = 0, check C ( > 0 What is C (? C y ( = ( Maimum A Posteriori (MAP Estimation (y ( 0.5 = arg min + y C ( = = 0 y + 0.5σ = + σ C ( = + > 0 Maimum A Posteriori (MAP Estimation Weighted sum of observation and prior mean Closer to prior mean when (y = arg min σ ( is high y + 0.5σ = + σ Means the function is conve (quadratic functions with a positive coefficient for are conve
4 Let's go beyond "single piel images": Y [n] = [n] + ϵ[n] If noise is independent at each piel p({y[n]} {[n]} = p(y[n] [n] = (Y [n]; [n], σ n n Similarly, if prior p({[n]} is defined to model piels independently: Product turns into sum a er taking log p({[n]} = ([n]; 0.5, n (Y[n] [n] ([n] 0.5 [n] = arg min = + {[n]} Note: This is a minimization over multiple variables n But since the different variables don't interact with each other, we can optimize each piel separately Y[n] [n] = n + Multi-variate Gaussians Re-interpret these as multi-variate Gaussians For R d : (π d det(σ p( = ep ( ( μ ( μ Here, μ R d is a mean "vector", same size as Co-variance Σ is a symmetric, positive-definite matri d d matri When the different elements of are independent, off-diagonal elements of Σ are 0. T Σ Multi-variate Gaussians Represent and Y as vectorized images Here the mean-vector for Y is p(y = (Y ;, σ I The co-variance matri is a diagonal matri with all diagonal entries = σ How do we represent our prior and likelihood as a multi-variate Gaussian?
5 Multi-variate Gaussians μ =, Σ = I det(σ = ( d Σ = σ I (π d det(σ p(y = ep ( (Y μ (Y μ T Σ T T Σ (Y μ (Y μ = (Y I(Y = (Y I (Y = (Y (Y = Y T = (Y [n] [n] T min But now, we can use these multi-variate distributions to model correlations and interactions between different piels Let's stay with denoising, but talk about defining a prior in the Wavelet domain Instead of saying, individual piel values are independent, let's say individual wavelet coefficients are independent Let's also put different means and variances on different wavelet coefficients All the 'derivative' coefficients have zero mean Variance goes up as we go to coarser levels p(y = (Y;, σ I p( = (Y; 0.5I, I = arg Y σ n Let C = W, where W represents the Wavelet transform matri W is unitary, = = W T C W Define our prior on C: W T p(c = (C; μ c, D c Change of Variables We have probability distribution on a random variable U : p U (U U = f (V is a one-to-one function p V (V = p U (f (V? Realize, that p( are densities. So we need to account for "scaling" of the probability measure. Now, D c is a diagonal matri, with entries equal to corresponding variances of coefficients Off-diagonal elements 0 implies Wavelet co-efficients are un-correlated A prior on C implies a prior on C is just a different representation for where det( J f ij = dui dvj p (UdU U = p (f (V du U du = det( J f dv So, p V (V = p U (f (Vdet( J f If U = f (V = A V is linear, J = A. If A is unitary, det(a =.
6 Let C = W, where W represents the Wavelet transform matri Now, let's denoise with this prior. p(c = (C; μ c, D c p( = ep ( (W μ (W c T D c μ c (π det( d D c ep ( ( W T μ W ( c T W T D c W T μ c = (; W T μ c, W T D c W where, μ = W T μ c, Σ = W T D c W. (We're assuming noise variance is. How do we minimize this? min = arg Y + ( μ ( μ Take derivative, set to 0. Check second derivative is positive. Take gradient (vector derivative, set to 0. Check that "Hessian" is positive-definite. y = f ( is a scalar valued function of a vector R d. T Σ The gradient is a vector same sized as, with each entry being the partial derivative of y with respect to that element of. y y y = y 3 y = f ( is a scalar valued function of a vector R d. The gradient is a vector same sized as, with each entry being the partial derivative of y with respect to that element of. The Hessian is a matri of size d d for R d : y y y = y 3 ( H y ij = y i j Properties of a multi-variate quadratic form where Q is a symmetric d d matri, R is a d-dimensional vector, S is a scalar. Note that this is a scalar value Comes from the following identities T A = (A + A T T R = R T = R T Q T R + S = Q R The Hessian is simply given by Q (it is constant, doesn't depend on If Q is positive definite, then = 0 gives us the unique minimizer. Q = 0 Q R = 0 = R
7 where, μ = W T μ c, Σ = W T D c W. Q = (I + Σ min = arg Y + ( μ ( μ is positive definite (sum of two positive-definite matrices is positive-definite. i.e., taking inverse in the original de-correlated wavelet domain. i.e., inverse-wavelet transform of the scaled wavelet coefficients. T Σ = arg min (I + (Y + μ + ( Y + μ T Σ T Σ μ T Σ = (I + Σ (Y + Σ μ Σ = ( W T D c W = W D c W T = W T D c W Σ μ = W T D c Wμ = W T D c W W T μ c = W T D c μ c = arg min T (I + Σ T (Y + Σ μ + ( Y + μ T Σ μ = (I + Σ (Y + Σ μ Σ = W T D c W, Σ μ = W T D c Note that we can't actually form I + Σ, let alone invert it: d d matri, with d = total number of piels. I + D c I + Σ = I + W T D c W = W T W + W T D c W = W T IW + W T D c W = W T (I + D c W (I + Σ = W T (I + D c W is also diagonal, so it's inverse is just inverting diagonal elements. = W T (I + D c W (Y + W T D c μ c μ c Let's call C = W, = W Y. = arg min T (I + Σ T (Y + Σ μ + ( Y + μ T Σ μ C y = W T (I + D c W (Y + W T D c μ c W = W W T (I + D c (W Y + W W T D c μc C = (I + D c ( + C y D c μc So what we did is that we "denoised" in the wavelet domain, with the noise variance in the wavelet domain also being equal to. Y = + ϵ C y = C + W ϵ. W ϵ is also a random vector with 0 mean, and covariance = WI W T = I. So far, we've been in a Bayesian setting with a prior. = arg min log p(y log p( We're balancing off fidelity to observation (first term, likelihood, and what we believe about (second term, prior But we need p( to be 'a proper' probability distribution. Sometimes, that's too restrictive. Instead just think of these as a "data term" (still comes from observation model, and have a generic "regularization" term. Here, G is a vector corresponding to the image we get by convolving with the Gaussian -derivative filter. = arg min D(; Y + R( = arg Y + λ ( + min G G y
8 Data term still has the same form as for denoising. But now, regularization just says we want the and y spatial derivatives of our image to be small Encodes a prior that natural images are smooth. This isn't a prior. Because gradients aren't a "representation" of. But better, because they don't enforce smoothness at alternate gradient locations. How do we minimize this? = arg Y + λ ( + min G G y
CSE 559A: Computer Vision
CSE 559A: Computer Vision Fall 208: T-R: :30-pm @ Lopata 0 Instructor: Ayan Chakrabarti (ayan@wustl.edu). Course Staff: Zhihao ia, Charlie Wu, Han Liu http://www.cse.wustl.edu/~ayan/courses/cse559a/ Sep
More informationPSET 0 Due Today by 11:59pm Any issues with submissions, post on Piazza. Fall 2018: T-R: Lopata 101. Median Filter / Order Statistics
CSE 559A: Computer Vision OFFICE HOURS This Friday (and this Friday only): Zhihao's Office Hours in Jolley 43 instead of 309. Monday Office Hours: 5:30-6:30pm, Collaboration Space @ Jolley 27. PSET 0 Due
More informationCSE 559A: Computer Vision EVERYONE needs to fill out survey.
CSE 559A: Computer Vision ADMINISRIVIA EVERYONE needs to fill out survey. Setup git and Anaconda, send us your public key, and do problem set 0. Do immediately: submit public key and make sure you can
More informationCSE 559A: Computer Vision
CSE 559A: Computer Vision Fall 2017: T-R: 11:30-1pm @ Lopata 101 Instructor: Ayan Chakrabarti (ayan@wustl.edu). Staff: Abby Stylianou (abby@wustl.edu), Jarett Gross (jarett@wustl.edu) http://www.cse.wustl.edu/~ayan/courses/cse559a/
More informationGENERAL. CSE 559A: Computer Vision AUTOGRAD AUTOGRAD
CSE 559A: Computer Vision Fall 2017: T-R: 11:30-1pm @ Lopata 101 GENERAL PSET 5 Posted. One day extension (now due on the Friday two weeks from now) You get to implement SLIC, and your own conv layer!
More informationBayesian Paradigm. Maximum A Posteriori Estimation
Bayesian Paradigm Maximum A Posteriori Estimation Simple acquisition model noise + degradation Constraint minimization or Equivalent formulation Constraint minimization Lagrangian (unconstraint minimization)
More informationSTA 414/2104, Spring 2014, Practice Problem Set #1
STA 44/4, Spring 4, Practice Problem Set # Note: these problems are not for credit, and not to be handed in Question : Consider a classification problem in which there are two real-valued inputs, and,
More informationADMINISTRIVIA SENSOR
CSE 559A: Computer Vision ADMINISRIVIA Everyone needs to fill out survey and join Piazza. We strongly encourage you to use Piazza over e-mail for questions (mark only to instructors if you wish). Also
More information2 Statistical Estimation: Basic Concepts
Technion Israel Institute of Technology, Department of Electrical Engineering Estimation and Identification in Dynamical Systems (048825) Lecture Notes, Fall 2009, Prof. N. Shimkin 2 Statistical Estimation:
More informationMachine Learning Lecture 2
Announcements Machine Learning Lecture 2 Eceptional number of lecture participants this year Current count: 449 participants This is very nice, but it stretches our resources to their limits Probability
More informationExpectation Maximization Deconvolution Algorithm
Epectation Maimization Deconvolution Algorithm Miaomiao ZHANG March 30, 2011 Abstract In this paper, we use a general mathematical and eperimental methodology to analyze image deconvolution. The main procedure
More informationMachine Learning Lecture 2
Machine Perceptual Learning and Sensory Summer Augmented 6 Computing Announcements Machine Learning Lecture 2 Course webpage http://www.vision.rwth-aachen.de/teaching/ Slides will be made available on
More informationMachine Learning (CS 567) Lecture 5
Machine Learning (CS 567) Lecture 5 Time: T-Th 5:00pm - 6:20pm Location: GFS 118 Instructor: Sofus A. Macskassy (macskass@usc.edu) Office: SAL 216 Office hours: by appointment Teaching assistant: Cheol
More informationNotes on Discriminant Functions and Optimal Classification
Notes on Discriminant Functions and Optimal Classification Padhraic Smyth, Department of Computer Science University of California, Irvine c 2017 1 Discriminant Functions Consider a classification problem
More informationGRADIENT DESCENT. CSE 559A: Computer Vision GRADIENT DESCENT GRADIENT DESCENT [0, 1] Pr(y = 1) w T x. 1 f (x; θ) = 1 f (x; θ) = exp( w T x)
0 x x x CSE 559A: Computer Vision For Binary Classification: [0, ] f (x; ) = σ( x) = exp( x) + exp( x) Output is interpreted as probability Pr(y = ) x are the log-odds. Fall 207: -R: :30-pm @ Lopata 0
More informationDiscrete Mathematics and Probability Theory Fall 2015 Lecture 21
CS 70 Discrete Mathematics and Probability Theory Fall 205 Lecture 2 Inference In this note we revisit the problem of inference: Given some data or observations from the world, what can we infer about
More informationMon Feb Matrix algebra and matrix inverses. Announcements: Warm-up Exercise:
Math 2270-004 Week 5 notes We will not necessarily finish the material from a given day's notes on that day We may also add or subtract some material as the week progresses, but these notes represent an
More information5. Discriminant analysis
5. Discriminant analysis We continue from Bayes s rule presented in Section 3 on p. 85 (5.1) where c i is a class, x isap-dimensional vector (data case) and we use class conditional probability (density
More informationProbabilistic & Unsupervised Learning
Probabilistic & Unsupervised Learning Week 2: Latent Variable Models Maneesh Sahani maneesh@gatsby.ucl.ac.uk Gatsby Computational Neuroscience Unit, and MSc ML/CSML, Dept Computer Science University College
More informationASummaryofGaussianProcesses Coryn A.L. Bailer-Jones
ASummaryofGaussianProcesses Coryn A.L. Baier-Jones Cavendish Laboratory University of Cambridge caj@mrao.cam.ac.uk Introduction A genera prediction probem can be posed as foows. We consider that the variabe
More informationBasic concepts in estimation
Basic concepts in estimation Random and nonrandom parameters Definitions of estimates ML Maimum Lielihood MAP Maimum A Posteriori LS Least Squares MMS Minimum Mean square rror Measures of quality of estimates
More informationAn example to illustrate frequentist and Bayesian approches
Frequentist_Bayesian_Eample An eample to illustrate frequentist and Bayesian approches This is a trivial eample that illustrates the fundamentally different points of view of the frequentist and Bayesian
More informationSeptember 12, Math Analysis Ch 1 Review Solutions. #1. 8x + 10 = 4x 30 4x 4x 4x + 10 = x = x = 10.
#1. 8x + 10 = 4x 30 4x 4x 4x + 10 = 30 10 10 4x = 40 4 4 x = 10 Sep 5 7:00 AM 1 #. 4 3(x + ) = 5x 7(4 x) 4 3x 6 = 5x 8 + 7x CLT 3x = 1x 8 +3x +3x = 15x 8 +8 +8 6 = 15x 15 15 x = 6 15 Sep 5 7:00 AM #3.
More informationWhere now? Machine Learning and Bayesian Inference
Machine Learning and Bayesian Inference Dr Sean Holden Computer Laboratory, Room FC6 Telephone etension 67 Email: sbh@clcamacuk wwwclcamacuk/ sbh/ Where now? There are some simple take-home messages from
More informationMachine Learning 4771
Machine Learning 4771 Instructor: Tony Jebara Topic 7 Unsupervised Learning Statistical Perspective Probability Models Discrete & Continuous: Gaussian, Bernoulli, Multinomial Maimum Likelihood Logistic
More informationL20: MLPs, RBFs and SPR Bayes discriminants and MLPs The role of MLP hidden units Bayes discriminants and RBFs Comparison between MLPs and RBFs
L0: MLPs, RBFs and SPR Bayes discriminants and MLPs The role of MLP hidden units Bayes discriminants and RBFs Comparison between MLPs and RBFs CSCE 666 Pattern Analysis Ricardo Gutierrez-Osuna CSE@TAMU
More informationLecture 3: Pattern Classification. Pattern classification
EE E68: Speech & Audio Processing & Recognition Lecture 3: Pattern Classification 3 4 5 The problem of classification Linear and nonlinear classifiers Probabilistic classification Gaussians, mitures and
More informationECE521 week 3: 23/26 January 2017
ECE521 week 3: 23/26 January 2017 Outline Probabilistic interpretation of linear regression - Maximum likelihood estimation (MLE) - Maximum a posteriori (MAP) estimation Bias-variance trade-off Linear
More informationNaïve Bayes classification
Naïve Bayes classification 1 Probability theory Random variable: a variable whose possible values are numerical outcomes of a random phenomenon. Examples: A person s height, the outcome of a coin toss
More informationNaive Bayes and Gaussian Bayes Classifier
Naive Bayes and Gaussian Bayes Classifier Elias Tragas tragas@cs.toronto.edu October 3, 2016 Elias Tragas Naive Bayes and Gaussian Bayes Classifier October 3, 2016 1 / 23 Naive Bayes Bayes Rules: Naive
More informationCPSC 340: Machine Learning and Data Mining. MLE and MAP Fall 2017
CPSC 340: Machine Learning and Data Mining MLE and MAP Fall 2017 Assignment 3: Admin 1 late day to hand in tonight, 2 late days for Wednesday. Assignment 4: Due Friday of next week. Last Time: Multi-Class
More informationSYDE 372 Introduction to Pattern Recognition. Probability Measures for Classification: Part I
SYDE 372 Introduction to Pattern Recognition Probability Measures for Classification: Part I Alexander Wong Department of Systems Design Engineering University of Waterloo Outline 1 2 3 4 Why use probability
More informationMachine Learning. Lecture 4: Regularization and Bayesian Statistics. Feng Li. https://funglee.github.io
Machine Learning Lecture 4: Regularization and Bayesian Statistics Feng Li fli@sdu.edu.cn https://funglee.github.io School of Computer Science and Technology Shandong University Fall 207 Overfitting Problem
More informationGaussian and Linear Discriminant Analysis; Multiclass Classification
Gaussian and Linear Discriminant Analysis; Multiclass Classification Professor Ameet Talwalkar Slide Credit: Professor Fei Sha Professor Ameet Talwalkar CS260 Machine Learning Algorithms October 13, 2015
More informationMarginal density. If the unknown is of the form x = (x 1, x 2 ) in which the target of investigation is x 1, a marginal posterior density
Marginal density If the unknown is of the form x = x 1, x 2 ) in which the target of investigation is x 1, a marginal posterior density πx 1 y) = πx 1, x 2 y)dx 2 = πx 2 )πx 1 y, x 2 )dx 2 needs to be
More informationReminders. Thought questions should be submitted on eclass. Please list the section related to the thought question
Linear regression Reminders Thought questions should be submitted on eclass Please list the section related to the thought question If it is a more general, open-ended question not exactly related to a
More informationLecture 3: Latent Variables Models and Learning with the EM Algorithm. Sam Roweis. Tuesday July25, 2006 Machine Learning Summer School, Taiwan
Lecture 3: Latent Variables Models and Learning with the EM Algorithm Sam Roweis Tuesday July25, 2006 Machine Learning Summer School, Taiwan Latent Variable Models What to do when a variable z is always
More informationAlgorithmisches Lernen/Machine Learning
Algorithmisches Lernen/Machine Learning Part 1: Stefan Wermter Introduction Connectionist Learning (e.g. Neural Networks) Decision-Trees, Genetic Algorithms Part 2: Norman Hendrich Support-Vector Machines
More informationGAUSSIAN PROCESS REGRESSION
GAUSSIAN PROCESS REGRESSION CSE 515T Spring 2015 1. BACKGROUND The kernel trick again... The Kernel Trick Consider again the linear regression model: y(x) = φ(x) w + ε, with prior p(w) = N (w; 0, Σ). The
More informationIntroduction to Machine Learning
Outline Introduction to Machine Learning Bayesian Classification Varun Chandola March 8, 017 1. {circular,large,light,smooth,thick}, malignant. {circular,large,light,irregular,thick}, malignant 3. {oval,large,dark,smooth,thin},
More informationNaïve Bayes classification. p ij 11/15/16. Probability theory. Probability theory. Probability theory. X P (X = x i )=1 i. Marginal Probability
Probability theory Naïve Bayes classification Random variable: a variable whose possible values are numerical outcomes of a random phenomenon. s: A person s height, the outcome of a coin toss Distinguish
More informationLecture 2: Parameter Learning in Fully Observed Graphical Models. Sam Roweis. Monday July 24, 2006 Machine Learning Summer School, Taiwan
Lecture 2: Parameter Learning in Fully Observed Graphical Models Sam Roweis Monday July 24, 2006 Machine Learning Summer School, Taiwan Likelihood Functions So far we have focused on the (log) probability
More informationFrequentist-Bayesian Model Comparisons: A Simple Example
Frequentist-Bayesian Model Comparisons: A Simple Example Consider data that consist of a signal y with additive noise: Data vector (N elements): D = y + n The additive noise n has zero mean and diagonal
More informationState Space and Hidden Markov Models
State Space and Hidden Markov Models Kunsch H.R. State Space and Hidden Markov Models. ETH- Zurich Zurich; Aliaksandr Hubin Oslo 2014 Contents 1. Introduction 2. Markov Chains 3. Hidden Markov and State
More informationStatistical learning. Chapter 20, Sections 1 3 1
Statistical learning Chapter 20, Sections 1 3 Chapter 20, Sections 1 3 1 Outline Bayesian learning Maximum a posteriori and maximum likelihood learning Bayes net learning ML parameter learning with complete
More informationBayes Decision Theory - I
Bayes Decision Theory - I Nuno Vasconcelos (Ken Kreutz-Delgado) UCSD Statistical Learning from Data Goal: Given a relationship between a feature vector and a vector y, and iid data samples ( i,y i ), find
More informationMobile Robot Localization
Mobile Robot Localization 1 The Problem of Robot Localization Given a map of the environment, how can a robot determine its pose (planar coordinates + orientation)? Two sources of uncertainty: - observations
More informationToday. Probability and Statistics. Linear Algebra. Calculus. Naïve Bayes Classification. Matrix Multiplication Matrix Inversion
Today Probability and Statistics Naïve Bayes Classification Linear Algebra Matrix Multiplication Matrix Inversion Calculus Vector Calculus Optimization Lagrange Multipliers 1 Classical Artificial Intelligence
More informationNaive Bayes and Gaussian Bayes Classifier
Naive Bayes and Gaussian Bayes Classifier Mengye Ren mren@cs.toronto.edu October 18, 2015 Mengye Ren Naive Bayes and Gaussian Bayes Classifier October 18, 2015 1 / 21 Naive Bayes Bayes Rules: Naive Bayes
More informationLecture Note 2: Estimation and Information Theory
Univ. of Michigan - NAME 568/EECS 568/ROB 530 Winter 2018 Lecture Note 2: Estimation and Information Theory Lecturer: Maani Ghaffari Jadidi Date: April 6, 2018 2.1 Estimation A static estimation problem
More informationOverfitting, Bias / Variance Analysis
Overfitting, Bias / Variance Analysis Professor Ameet Talwalkar Professor Ameet Talwalkar CS260 Machine Learning Algorithms February 8, 207 / 40 Outline Administration 2 Review of last lecture 3 Basic
More informationNaive Bayes and Gaussian Bayes Classifier
Naive Bayes and Gaussian Bayes Classifier Ladislav Rampasek slides by Mengye Ren and others February 22, 2016 Naive Bayes and Gaussian Bayes Classifier February 22, 2016 1 / 21 Naive Bayes Bayes Rule:
More informationNoise-Blind Image Deblurring Supplementary Material
Noise-Blind Image Deblurring Supplementary Material Meiguang Jin University of Bern Switzerland Stefan Roth TU Darmstadt Germany Paolo Favaro University of Bern Switzerland A. Upper and Lower Bounds Our
More informationBayesian Interpretations of Regularization
Bayesian Interpretations of Regularization Charlie Frogner 9.50 Class 15 April 1, 009 The Plan Regularized least squares maps {(x i, y i )} n i=1 to a function that minimizes the regularized loss: f S
More informationStatistical learning. Chapter 20, Sections 1 4 1
Statistical learning Chapter 20, Sections 1 4 Chapter 20, Sections 1 4 1 Outline Bayesian learning Maximum a posteriori and maximum likelihood learning Bayes net learning ML parameter learning with complete
More information1 Kernel methods & optimization
Machine Learning Class Notes 9-26-13 Prof. David Sontag 1 Kernel methods & optimization One eample of a kernel that is frequently used in practice and which allows for highly non-linear discriminant functions
More informationAnnouncements Wednesday, November 01
Announcements Wednesday, November 01 WeBWorK 3.1, 3.2 are due today at 11:59pm. The quiz on Friday covers 3.1, 3.2. My office is Skiles 244. Rabinoffice hours are Monday, 1 3pm and Tuesday, 9 11am. Section
More informationLikelihood Bounds for Constrained Estimation with Uncertainty
Proceedings of the 44th IEEE Conference on Decision and Control, and the European Control Conference 5 Seville, Spain, December -5, 5 WeC4. Likelihood Bounds for Constrained Estimation with Uncertainty
More information1 Bayesian Linear Regression (BLR)
Statistical Techniques in Robotics (STR, S15) Lecture#10 (Wednesday, February 11) Lecturer: Byron Boots Gaussian Properties, Bayesian Linear Regression 1 Bayesian Linear Regression (BLR) In linear regression,
More informationProbability and Estimation. Alan Moses
Probability and Estimation Alan Moses Random variables and probability A random variable is like a variable in algebra (e.g., y=e x ), but where at least part of the variability is taken to be stochastic.
More informationIntroduction to Machine Learning
Introduction to Machine Learning Bayesian Classification Varun Chandola Computer Science & Engineering State University of New York at Buffalo Buffalo, NY, USA chandola@buffalo.edu Chandola@UB CSE 474/574
More informationCPSC 340: Machine Learning and Data Mining
CPSC 340: Machine Learning and Data Mining MLE and MAP Original version of these slides by Mark Schmidt, with modifications by Mike Gelbart. 1 Admin Assignment 4: Due tonight. Assignment 5: Will be released
More informationy Xw 2 2 y Xw λ w 2 2
CS 189 Introduction to Machine Learning Spring 2018 Note 4 1 MLE and MAP for Regression (Part I) So far, we ve explored two approaches of the regression framework, Ordinary Least Squares and Ridge Regression:
More informationProbabilistic Graphical Models
School of Computer Science Probabilistic Graphical Models Gaussian graphical models and Ising models: modeling networks Eric Xing Lecture 0, February 5, 06 Reading: See class website Eric Xing @ CMU, 005-06
More informationBayesian statistics. DS GA 1002 Statistical and Mathematical Models. Carlos Fernandez-Granda
Bayesian statistics DS GA 1002 Statistical and Mathematical Models http://www.cims.nyu.edu/~cfgranda/pages/dsga1002_fall15 Carlos Fernandez-Granda Frequentist vs Bayesian statistics In frequentist statistics
More informationDerivation of the Kalman Filter
Derivation of the Kalman Filter Kai Borre Danish GPS Center, Denmark Block Matrix Identities The key formulas give the inverse of a 2 by 2 block matrix, assuming T is invertible: T U 1 L M. (1) V W N P
More informationColor Scheme. swright/pcmi/ M. Figueiredo and S. Wright () Inference and Optimization PCMI, July / 14
Color Scheme www.cs.wisc.edu/ swright/pcmi/ M. Figueiredo and S. Wright () Inference and Optimization PCMI, July 2016 1 / 14 Statistical Inference via Optimization Many problems in statistical inference
More informationMACHINE LEARNING ADVANCED MACHINE LEARNING
MACHINE LEARNING ADVANCED MACHINE LEARNING Recap of Important Notions on Estimation of Probability Density Functions 22 MACHINE LEARNING Discrete Probabilities Consider two variables and y taking discrete
More informationToday. Statistical Learning. Coin Flip. Coin Flip. Experiment 1: Heads. Experiment 1: Heads. Which coin will I use? Which coin will I use?
Today Statistical Learning Parameter Estimation: Maximum Likelihood (ML) Maximum A Posteriori (MAP) Bayesian Continuous case Learning Parameters for a Bayesian Network Naive Bayes Maximum Likelihood estimates
More informationLinear Regression (9/11/13)
STA561: Probabilistic machine learning Linear Regression (9/11/13) Lecturer: Barbara Engelhardt Scribes: Zachary Abzug, Mike Gloudemans, Zhuosheng Gu, Zhao Song 1 Why use linear regression? Figure 1: Scatter
More informationRegression. Machine Learning and Pattern Recognition. Chris Williams. School of Informatics, University of Edinburgh.
Regression Machine Learning and Pattern Recognition Chris Williams School of Informatics, University of Edinburgh September 24 (All of the slides in this course have been adapted from previous versions
More informationWhy do we care? Examples. Bayes Rule. What room am I in? Handling uncertainty over time: predicting, estimating, recognizing, learning
Handling uncertainty over time: predicting, estimating, recognizing, learning Chris Atkeson 004 Why do we care? Speech recognition makes use of dependence of words and phonemes across time. Knowing where
More informationCS 4495 Computer Vision Principle Component Analysis
CS 4495 Computer Vision Principle Component Analysis (and it s use in Computer Vision) Aaron Bobick School of Interactive Computing Administrivia PS6 is out. Due *** Sunday, Nov 24th at 11:55pm *** PS7
More informationApproximate Message Passing with Built-in Parameter Estimation for Sparse Signal Recovery
Approimate Message Passing with Built-in Parameter Estimation for Sparse Signal Recovery arxiv:1606.00901v1 [cs.it] Jun 016 Shuai Huang, Trac D. Tran Department of Electrical and Computer Engineering Johns
More informationINF Introduction to classifiction Anne Solberg Based on Chapter 2 ( ) in Duda and Hart: Pattern Classification
INF 4300 151014 Introduction to classifiction Anne Solberg anne@ifiuiono Based on Chapter 1-6 in Duda and Hart: Pattern Classification 151014 INF 4300 1 Introduction to classification One of the most challenging
More informationLecture : Probabilistic Machine Learning
Lecture : Probabilistic Machine Learning Riashat Islam Reasoning and Learning Lab McGill University September 11, 2018 ML : Many Methods with Many Links Modelling Views of Machine Learning Machine Learning
More informationComputer Vision Group Prof. Daniel Cremers. 3. Regression
Prof. Daniel Cremers 3. Regression Categories of Learning (Rep.) Learnin g Unsupervise d Learning Clustering, density estimation Supervised Learning learning from a training data set, inference on the
More informationAnnouncements Wednesday, November 01
Announcements Wednesday, November 01 WeBWorK 3.1, 3.2 are due today at 11:59pm. The quiz on Friday covers 3.1, 3.2. My office is Skiles 244. Rabinoffice hours are Monday, 1 3pm and Tuesday, 9 11am. Section
More informationCovariance Matrix Simplification For Efficient Uncertainty Management
PASEO MaxEnt 2007 Covariance Matrix Simplification For Efficient Uncertainty Management André Jalobeanu, Jorge A. Gutiérrez PASEO Research Group LSIIT (CNRS/ Univ. Strasbourg) - Illkirch, France *part
More informationMachine Learning Lecture 3
Announcements Machine Learning Lecture 3 Eam dates We re in the process of fiing the first eam date Probability Density Estimation II 9.0.207 Eercises The first eercise sheet is available on L2P now First
More informationCPSC 340: Machine Learning and Data Mining. More PCA Fall 2017
CPSC 340: Machine Learning and Data Mining More PCA Fall 2017 Admin Assignment 4: Due Friday of next week. No class Monday due to holiday. There will be tutorials next week on MAP/PCA (except Monday).
More informationParameter estimation Conditional risk
Parameter estimation Conditional risk Formalizing the problem Specify random variables we care about e.g., Commute Time e.g., Heights of buildings in a city We might then pick a particular distribution
More informationBayesian Decision Theory
Bayesian Decision Theory Dr. Shuang LIANG School of Software Engineering TongJi University Fall, 2012 Today s Topics Bayesian Decision Theory Bayesian classification for normal distributions Error Probabilities
More informationCOMP 551 Applied Machine Learning Lecture 20: Gaussian processes
COMP 55 Applied Machine Learning Lecture 2: Gaussian processes Instructor: Ryan Lowe (ryan.lowe@cs.mcgill.ca) Slides mostly by: (herke.vanhoof@mcgill.ca) Class web page: www.cs.mcgill.ca/~hvanho2/comp55
More informationLinear regression example Simple linear regression: f(x) = ϕ(x)t w w ~ N(0, ) The mean and covariance are given by E[f(x)] = ϕ(x)e[w] = 0.
Gaussian Processes Gaussian Process Stochastic process: basically, a set of random variables. may be infinite. usually related in some way. Gaussian process: each variable has a Gaussian distribution every
More informationLet's transfer our results for conditional probability for events into conditional probabilities for random variables.
Kolmogorov/Smoluchowski equation approach to Brownian motion Tuesday, February 12, 2013 1:53 PM Readings: Gardiner, Secs. 1.2, 3.8.1, 3.8.2 Einstein Homework 1 due February 22. Conditional probability
More informationChoosing among models
Eco 515 Fall 2014 Chris Sims Choosing among models September 18, 2014 c 2014 by Christopher A. Sims. This document is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 3.0 Unported
More informationStatistical Techniques in Robotics (16-831, F12) Lecture#21 (Monday November 12) Gaussian Processes
Statistical Techniques in Robotics (16-831, F12) Lecture#21 (Monday November 12) Gaussian Processes Lecturer: Drew Bagnell Scribe: Venkatraman Narayanan 1, M. Koval and P. Parashar 1 Applications of Gaussian
More informationLinear Regression (continued)
Linear Regression (continued) Professor Ameet Talwalkar Professor Ameet Talwalkar CS260 Machine Learning Algorithms February 6, 2017 1 / 39 Outline 1 Administration 2 Review of last lecture 3 Linear regression
More informationStochastic Integration (Simple Version)
Stochastic Integration (Simple Version) Tuesday, March 17, 2015 2:03 PM Reading: Gardiner Secs. 4.1-4.3, 4.4.4 But there are boundary issues when (if ) so we can't apply the standard delta function integration
More informationMidterm. Introduction to Machine Learning. CS 189 Spring Please do not open the exam before you are instructed to do so.
CS 89 Spring 07 Introduction to Machine Learning Midterm Please do not open the exam before you are instructed to do so. The exam is closed book, closed notes except your one-page cheat sheet. Electronic
More informationRecursive Estimation
Recursive Estimation Raffaello D Andrea Spring 07 Problem Set 3: Etracting Estimates from Probability Distributions Last updated: April 5, 07 Notes: Notation: Unlessotherwisenoted,, y,andz denoterandomvariables,
More informationCSC2541 Lecture 2 Bayesian Occam s Razor and Gaussian Processes
CSC2541 Lecture 2 Bayesian Occam s Razor and Gaussian Processes Roger Grosse Roger Grosse CSC2541 Lecture 2 Bayesian Occam s Razor and Gaussian Processes 1 / 55 Adminis-Trivia Did everyone get my e-mail
More informationBayesian Inference of Noise Levels in Regression
Bayesian Inference of Noise Levels in Regression Christopher M. Bishop Microsoft Research, 7 J. J. Thomson Avenue, Cambridge, CB FB, U.K. cmbishop@microsoft.com http://research.microsoft.com/ cmbishop
More informationBayesian Linear Regression [DRAFT - In Progress]
Bayesian Linear Regression [DRAFT - In Progress] David S. Rosenberg Abstract Here we develop some basics of Bayesian linear regression. Most of the calculations for this document come from the basic theory
More informationNonparametric Bayesian Methods (Gaussian Processes)
[70240413 Statistical Machine Learning, Spring, 2015] Nonparametric Bayesian Methods (Gaussian Processes) Jun Zhu dcszj@mail.tsinghua.edu.cn http://bigml.cs.tsinghua.edu.cn/~jun State Key Lab of Intelligent
More informationA 2. =... = c c N. 's arise from the three types of elementary row operations. If rref A = I its determinant is 1, and A = c 1
Theorem: Let A n n Then A 1 exists if and only if det A 0 proof: We already know that A 1 exists if and only if the reduced row echelon form of A is the identity matrix Now, consider reducing A to its
More informationMatrix Dimensions(orders)
Definition of Matrix A matrix is a collection of numbers arranged into a fixed number of rows and columns. Usually the numbers are real numbers. In general, matrices can contain complex numbers but we
More informationx. Figure 1: Examples of univariate Gaussian pdfs N (x; µ, σ 2 ).
.8.6 µ =, σ = 1 µ = 1, σ = 1 / µ =, σ =.. 3 1 1 3 x Figure 1: Examples of univariate Gaussian pdfs N (x; µ, σ ). The Gaussian distribution Probably the most-important distribution in all of statistics
More information