CSC321: 2011 Introduction to Neural Networks and Machine Learning. Lecture 11: Bayesian learning continued. Geoffrey Hinton
|
|
- Lynne Green
- 5 years ago
- Views:
Transcription
1 CSC31: 011 Introdution to Neural Networks and Mahine Learning Leture 11: Bayesian learning ontinued Geoffrey Hinton
2 Bayes Theorem, Prior robability of weight vetor Posterior robability of weight vetor given training data Probability of observed data given joint robability onditional robability Cost
3 Maximum A Posteriori Learning This trades-off the rior robabilities of the arameters against the robability of the data given the arameters. It looks for the arameters that have the greatest rodut of the rior term and the likelihood term. Minimizing the squared weights is equivalent to maximizing the robability of the weights under a zero-mean Gaussian rior. w w 0 w 1 w e w w k
4 The Bayesian interretation of weight deay i i i i w E C w d y C 1 1 * assuming a Gaussian rior for the weights assuming that the model makes a Gaussian redition onstant So the orret value of the weight deay arameter is the ratio of two varianes. Its not just an arbitrary hak.
5 Estimating the variane of the outut noise After we have learned a model that minimizes the squared error, we an find the best value for the outut noise. The best value is the one that maximizes the robability of roduing exatly the orret answers after adding Gaussian noise to the outut rodued by the neural net. The best value is found by simly using the variane of the residual errors.
6 Estimating the variane of the Gaussian rior on the weights After learning a model with some initial hoie of variane for the weight rior, we ould do a dirty trik alled emirial Bayes. Set the variane of the Gaussian rior to be whatever makes the weights that the model learned most likely. This is done by simly fitting a zero-mean Gaussian to the one-dimensional distribution of the learned weight values.
7 MaKay s quik and dirty method of hoosing the ratio of the noise variane to the weight rior variane. Start with guesses for both the noise variane and the weight rior variane o some learning Reset the noise variane to fit the residual errors Reset the weight rior varaine to fit the atual learned weights. Reeat until bored.
8 Full Bayesian Learning Instead of trying to find the best single setting of the arameters as in ML or MAP omute the full osterior distribution over arameter settings This is extremely omutationally intensive for all but the simlest models its feasible for a biased oin. To make reditions, let eah different setting of the arameters make its own redition and then ombine all these reditions by weighting eah of them by the osterior robability of that setting of the arameters. This is also omutationally intensive. The full Bayesian aroah allows us to use omliated models even when we do not have muh data
9 Overfitting: A frequentist illusion? If you do not have muh data, you should use a simle model, beause a omlex one will overfit. This is true. But only if you assume that fitting a model means hoosing a single best setting of the arameters. If you use the full osterior over arameter settings, overfitting disaears! ith little data, you get very vague reditions beause many different arameters settings have signifiant osterior robability
10 A lassi examle of overfitting hih model do you believe? The omliated model fits the data better. But it is not eonomial and it makes silly reditions. But what if we start with a reasonable rior over all fifth-order olynomials and use the full osterior distribution. Now we get vague and sensible reditions. There is no reason why the amount of data should influene our rior beliefs about the omlexity of the model.
11 Aroximating full Bayesian learning in a neural network If the neural net only has a few arameters we ould ut a grid over the arameter sae and evaluate at eah grid-oint. This is exensive, but it does not involve any gradient desent and there are no loal otimum issues. After evaluating eah grid oint we use all of them to make reditions on test data This is also exensive, but it works muh better than ML learning when the osterior is vague or multimodal this haens when data is sare. dtest inut test g dtest inut test, g g grid
12 An examle of full Bayesian learning Allow eah of the 6 weights or biases to have the 9 ossible values [- : 0.5 : ] So there are 9^6 grid-oints in arameter sae. For eah grid-oint omute the robability of the observed oututs of all the training ases. This is the likelihood term and is exlained on the next slide Multily the rior for eah grid-oint by the likelihood term and renormalize to get the osterior robability for eah grid-oint. Make reditions by using the osterior robabilities to average the reditions made by the different grid-oints. bias bias A neural net with inuts, 1 outut and 6 arameters
13 Comuting the likelihood term for a isti outut unit The outut of the isti unit is the robability that the network assigns to the answer 1. It assigns the omlementary robability to the answer 0. y f inut, g if d=1 if d=0 outut d inut, g d y 1 d 1 y all training oututs g outut d inut, g
14 hat an we do if there are too many arameters for a grid to be feasible? The number of grid oints is exonential in the number of arameters. So we annot deal with more than a few arameters using a grid. If there is enough data to make most arameter vetors very unlikely, only a tiny fration of the grid oints make a signifiant ontribution to the reditions. Maybe we an just evaluate this tiny fration It might be good enough to just samle weight vetors aording to their osterior robabilities. ytest inut test, i ytest inut test, i i Samle weight vetors with this robability
15 One method for samling weight vetors In standard bakroagation we kee moving the weights in the diretion that dereases the ost i.e. the diretion that inreases the likelihood lus the rior, summed over all training ases. Suose we add some Gaussian noise to the weight vetor after eah udate. So the weight vetor never settles down. It kees wandering around, but it tends to refer low ost regions of the weight sae. Amazing fat: If we use just the right amount of noise, and if we let the weight vetor wander around for long enough before we take a samle, we will get a samle from the true osterior over weight vetors. This is alled a Markov Chain Monte Carlo method and it makes it feasible to use full Bayesian learning with hundreds or thousands of arameters. There are related MCMC methods that are more omliated but more effiient we don t need to let the weights wander around for so long before we get samles from the osterior.
CSC321: 2011 Introduction to Neural Networks and Machine Learning. Lecture 10: The Bayesian way to fit models. Geoffrey Hinton
CSC31: 011 Introdution to Neural Networks and Mahine Learning Leture 10: The Bayesian way to fit models Geoffrey Hinton The Bayesian framework The Bayesian framework assumes that we always have a rior
More informationSTA 250: Statistics. Notes 7. Bayesian Approach to Statistics. Book chapters: 7.2
STA 25: Statistics Notes 7. Bayesian Aroach to Statistics Book chaters: 7.2 1 From calibrating a rocedure to quantifying uncertainty We saw that the central idea of classical testing is to rovide a rigorous
More informationWays to make neural networks generalize better
Ways to make neural networks generalize better Seminar in Deep Learning University of Tartu 04 / 10 / 2014 Pihel Saatmann Topics Overview of ways to improve generalization Limiting the size of the weights
More informationINFORMATION TRANSFER THROUGH CLASSIFIERS AND ITS RELATION TO PROBABILITY OF ERROR
IFORMATIO TRAFER TROUG CLAIFIER AD IT RELATIO TO PROBABILITY OF ERROR Deniz Erdogmus, Jose C. Prini Comutational euroengineering Lab (CEL, University of Florida, Gainesville, FL 6 [deniz,rini]@nel.ufl.edu
More informationBayesian Inference and an Intro to Monte Carlo Methods
Bayesian Inference and an Intro to Monte Carlo Methods Slides from Geoffrey Hinton and Iain Murray CSC411: Machine Learning and Data Mining, Winter 2017 Michael Guerzhoy 1 Reminder: Bayesian Inference
More informationIntroduction to Probability for Graphical Models
Introduction to Probability for Grahical Models CSC 4 Kaustav Kundu Thursday January 4, 06 *Most slides based on Kevin Swersky s slides, Inmar Givoni s slides, Danny Tarlow s slides, Jaser Snoek s slides,
More informationSubject: Modeling of Thermal Rocket Engines; Nozzle flow; Control of mass flow. p c. Thrust Chamber mixing and combustion
16.50 Leture 6 Subjet: Modeling of Thermal Roket Engines; Nozzle flow; Control of mass flow Though onetually simle, a roket engine is in fat hysially a very omlex devie and diffiult to reresent quantitatively
More informationLearning Sequence Motif Models Using Gibbs Sampling
Learning Sequence Motif Models Using Gibbs Samling BMI/CS 776 www.biostat.wisc.edu/bmi776/ Sring 2018 Anthony Gitter gitter@biostat.wisc.edu These slides excluding third-arty material are licensed under
More informationRisk Analysis in Water Quality Problems. Souza, Raimundo 1 Chagas, Patrícia 2 1,2 Departamento de Engenharia Hidráulica e Ambiental
Risk Analysis in Water Quality Problems. Downloaded from aselibrary.org by Uf - Universidade Federal Do Ceara on 1/29/14. Coyright ASCE. For ersonal use only; all rights reserved. Souza, Raimundo 1 Chagas,
More information4. Score normalization technical details We now discuss the technical details of the score normalization method.
SMT SCORING SYSTEM This document describes the scoring system for the Stanford Math Tournament We begin by giving an overview of the changes to scoring and a non-technical descrition of the scoring rules
More informationMethods of evaluating tests
Methods of evaluating tests Let X,, 1 Xn be i.i.d. Bernoulli( p ). Then 5 j= 1 j ( 5, ) T = X Binomial p. We test 1 H : p vs. 1 1 H : p>. We saw that a LRT is 1 if t k* φ ( x ) =. otherwise (t is the observed
More informationPanos Kouvelis Olin School of Business Washington University
Quality-Based Cometition, Profitability, and Variable Costs Chester Chambers Co Shool of Business Dallas, TX 7575 hamber@mailosmuedu -768-35 Panos Kouvelis Olin Shool of Business Washington University
More informationBayesian Networks Practice
Bayesian Networks Practice Part 2 2016-03-17 Byoung-Hee Kim, Seong-Ho Son Biointelligence Lab, CSE, Seoul National University Agenda Probabilistic Inference in Bayesian networks Probability basics D-searation
More informationBAYES CLASSIFIER. Ivan Michael Siregar APLYSIT IT SOLUTION CENTER. Jl. Ir. H. Djuanda 109 Bandung
BAYES CLASSIFIER www.aplysit.om www.ivan.siregar.biz ALYSIT IT SOLUTION CENTER Jl. Ir. H. Duanda 109 Bandung Ivan Mihael Siregar ivan.siregar@gmail.om Data Mining 2010 Bayesian Method Our fous this leture
More informationProportional-Integral-Derivative PID Controls
Proortional-Integral-Derivative PID Controls Dr M.J. Willis Det. of Chemial and Proess Engineering University of Newastle e-mail: mark.willis@nl.a.uk Written: 7 th November, 998 Udated: 6 th Otober, 999
More informationCSC2515 Winter 2015 Introduc3on to Machine Learning. Lecture 5: Clustering, mixture models, and EM
CSC2515 Winter 2015 Introdu3on to Mahine Learning Leture 5: Clustering, mixture models, and EM All leture slides will be available as.pdf on the ourse website: http://www.s.toronto.edu/~urtasun/ourses/csc2515/
More informationEE451/551: Digital Control. Relationship Between s and z Planes. The Relationship Between s and z Planes 11/10/2011
/0/0 EE45/55: Digital Control Chater 6: Digital Control System Design he Relationshi Between s and Planes As noted reviously: s j e e e e r s j where r e and If an analog system has oles at: s n jn a jd
More informationComplexity of Regularization RBF Networks
Complexity of Regularization RBF Networks Mark A Kon Department of Mathematis and Statistis Boston University Boston, MA 02215 mkon@buedu Leszek Plaskota Institute of Applied Mathematis University of Warsaw
More informationLecture 23 Maximum Likelihood Estimation and Bayesian Inference
Lecture 23 Maximum Likelihood Estimation and Bayesian Inference Thais Paiva STA 111 - Summer 2013 Term II August 7, 2013 1 / 31 Thais Paiva STA 111 - Summer 2013 Term II Lecture 23, 08/07/2013 Lecture
More information18.05 Problem Set 6, Spring 2014 Solutions
8.5 Problem Set 6, Spring 4 Solutions Problem. pts.) a) Throughout this problem we will let x be the data of 4 heads out of 5 tosses. We have 4/5 =.56. Computing the likelihoods: 5 5 px H )=.5) 5 px H
More informationBayesian Networks Practice
ayesian Networks Practice Part 2 2016-03-17 young-hee Kim Seong-Ho Son iointelligence ab CSE Seoul National University Agenda Probabilistic Inference in ayesian networks Probability basics D-searation
More informationEconS 503 Homework #8. Answer Key
EonS 503 Homework #8 Answer Key Exerise #1 Damaged good strategy (Menu riing) 1. It is immediate that otimal rie is = 3 whih yields rofits of ππ = 3/ (the alternative being a rie of = 1, yielding ππ =
More information2.2 BUDGET-CONSTRAINED CHOICE WITH TWO COMMODITIES
Essential Miroeonomis -- 22 BUDGET-CONSTRAINED CHOICE WITH TWO COMMODITIES Continuity of demand 2 Inome effets 6 Quasi-linear, Cobb-Douglas and CES referenes 9 Eenditure funtion 4 Substitution effets and
More informationRadial Basis Function Networks: Algorithms
Radial Basis Function Networks: Algorithms Introduction to Neural Networks : Lecture 13 John A. Bullinaria, 2004 1. The RBF Maing 2. The RBF Network Architecture 3. Comutational Power of RBF Networks 4.
More informationAI*IA 2003 Fusion of Multiple Pattern Classifiers PART III
AI*IA 23 Fusion of Multile Pattern Classifiers PART III AI*IA 23 Tutorial on Fusion of Multile Pattern Classifiers by F. Roli 49 Methods for fusing multile classifiers Methods for fusing multile classifiers
More informationOverview. Probabilistic Interpretation of Linear Regression Maximum Likelihood Estimation Bayesian Estimation MAP Estimation
Overview Probabilistic Interpretation of Linear Regression Maximum Likelihood Estimation Bayesian Estimation MAP Estimation Probabilistic Interpretation: Linear Regression Assume output y is generated
More informationCHAPTER 16. Basic Concepts. Basic Concepts. The Equilibrium Constant. Reaction Quotient & Equilibrium Constant. Chemical Equilibrium
Proerties of an Equilibrium System CHAPTER 6 Chemial Equilibrium Equilibrium systems are DYNAMIC (in onstant motion) REVERSIBLE an be aroahed from either diretion Pink to blue Co(H O) 6 Cl ---> > Co(H
More informationModeling Data with Linear Combinations of Basis Functions. Read Chapter 3 in the text by Bishop
Modeling Data with Linear Combinations of Basis Functions Read Chapter 3 in the text by Bishop A Type of Supervised Learning Problem We want to model data (x 1, t 1 ),..., (x N, t N ), where x i is a vector
More informationCS 687 Jana Kosecka. Uncertainty, Bayesian Networks Chapter 13, Russell and Norvig Chapter 14,
CS 687 Jana Koseka Unertainty Bayesian Networks Chapter 13 Russell and Norvig Chapter 14 14.1-14.3 Outline Unertainty robability Syntax and Semantis Inferene Independene and Bayes' Rule Syntax Basi element:
More informationNamed Entity Recognition using Maximum Entropy Model SEEM5680
Named Entity Recognition using Maximum Entroy Model SEEM5680 Named Entity Recognition System Named Entity Recognition (NER): Identifying certain hrases/word sequences in a free text. Generally it involves
More informationLecture 3a: The Origin of Variational Bayes
CSC535: 013 Advanced Machine Learning Lecture 3a: The Origin of Variational Bayes Geoffrey Hinton The origin of variational Bayes In variational Bayes, e approximate the true posterior across parameters
More informationBayesian classification CISC 5800 Professor Daniel Leeds
Bayesian classification CISC 5800 Professor Daniel Leeds Classifying with robabilities Examle goal: Determine is it cloudy out Available data: Light detector: x 0,25 Potential class (atmosheric states):
More informationFeedback-error control
Chater 4 Feedback-error control 4.1 Introduction This chater exlains the feedback-error (FBE) control scheme originally described by Kawato [, 87, 8]. FBE is a widely used neural network based controller
More informationApproximate inference in Energy-Based Models
CSC 2535: 2013 Lecture 3b Approximate inference in Energy-Based Models Geoffrey Hinton Two types of density model Stochastic generative model using directed acyclic graph (e.g. Bayes Net) Energy-based
More informationChapter 3: Maximum-Likelihood & Bayesian Parameter Estimation (part 1)
HW 1 due today Parameter Estimation Biometrics CSE 190 Lecture 7 Today s lecture was on the blackboard. These slides are an alternative presentation of the material. CSE190, Winter10 CSE190, Winter10 Chapter
More informationThe Poisson Regression Model
The Poisson Regression Model The Poisson regression model aims at modeling a counting variable Y, counting the number of times that a certain event occurs during a given time eriod. We observe a samle
More informationMachine Learning CSE546 Carlos Guestrin University of Washington. September 30, 2013
Bayesian Methods Machine Learning CSE546 Carlos Guestrin University of Washington September 30, 2013 1 What about prior n Billionaire says: Wait, I know that the thumbtack is close to 50-50. What can you
More informationBayesian Model Averaging Kriging Jize Zhang and Alexandros Taflanidis
HIPAD LAB: HIGH PERFORMANCE SYSTEMS LABORATORY DEPARTMENT OF CIVIL AND ENVIRONMENTAL ENGINEERING AND EARTH SCIENCES Bayesian Model Averaging Kriging Jize Zhang and Alexandros Taflanidis Why use metamodeling
More informationThe connection of dropout and Bayesian statistics
The connection of dropout and Bayesian statistics Interpretation of dropout as approximate Bayesian modelling of NN http://mlg.eng.cam.ac.uk/yarin/thesis/thesis.pdf Dropout Geoffrey Hinton Google, University
More informationsilicon wafers p b θ b IR heater θ h p h solution bath cleaning filter bellows pump
An Analysis of Cometitive Assoiative Net for Temerature Control of RCA Cleaning Solutions S Kurogi y, H Nobutomo y, T Nishida y, H Sakamoto y, Y Fuhikawa y, M Mimata z and K Itoh z ykyushu Institute of
More informationDensity Estimation. Seungjin Choi
Density Estimation Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr http://mlg.postech.ac.kr/
More informationLearning Energy-Based Models of High-Dimensional Data
Learning Energy-Based Models of High-Dimensional Data Geoffrey Hinton Max Welling Yee-Whye Teh Simon Osindero www.cs.toronto.edu/~hinton/energybasedmodelsweb.htm Discovering causal structure as a goal
More informationarxiv: v2 [cs.ai] 16 Feb 2016
Cells in Multidimensional Reurrent Neural Networks arxiv:42.2620v2 [s.ai] 6 Feb 206 Gundram Leifert Tobias Sauß Tobias Grüning Welf Wustlih Roger Labahn University of Rostok Institute of Mathematis 805
More informationBayesian Models in Machine Learning
Bayesian Models in Machine Learning Lukáš Burget Escuela de Ciencias Informáticas 2017 Buenos Aires, July 24-29 2017 Frequentist vs. Bayesian Frequentist point of view: Probability is the frequency of
More informationCS434a/541a: Pattern Recognition Prof. Olga Veksler. Lecture 3
CS434a/541a: attern Recognition rof. Olga Veksler Lecture 3 1 Announcements Link to error data in the book Reading assignment Assignment 1 handed out, due Oct. 4 lease send me an email with your name and
More informationLecture 7: Linear Classification Methods
Homeork Homeork Lecture 7: Linear lassification Methods Final rojects? Grous oics Proosal eek 5 Lecture is oster session, Jacobs Hall Lobby, snacks Final reort 5 June. What is linear classification? lassification
More informationCSC321 Lecture 18: Learning Probabilistic Models
CSC321 Lecture 18: Learning Probabilistic Models Roger Grosse Roger Grosse CSC321 Lecture 18: Learning Probabilistic Models 1 / 25 Overview So far in this course: mainly supervised learning Language modeling
More informationLecture : Probabilistic Machine Learning
Lecture : Probabilistic Machine Learning Riashat Islam Reasoning and Learning Lab McGill University September 11, 2018 ML : Many Methods with Many Links Modelling Views of Machine Learning Machine Learning
More informationAny AND-OR formula of size N can be evaluated in time N 1/2+o(1) on a quantum computer
Any AND-OR formula of size N an be evaluated in time N /2+o( on a quantum omuter Andris Ambainis, ambainis@math.uwaterloo.a Robert Šalek salek@ees.berkeley.edu Andrew M. Childs, amhilds@uwaterloo.a Shengyu
More informationLecture 5. G. Cowan Lectures on Statistical Data Analysis Lecture 5 page 1
Lecture 5 1 Probability (90 min.) Definition, Bayes theorem, probability densities and their properties, catalogue of pdfs, Monte Carlo 2 Statistical tests (90 min.) general concepts, test statistics,
More informationParametric Models. Dr. Shuang LIANG. School of Software Engineering TongJi University Fall, 2012
Parametric Models Dr. Shuang LIANG School of Software Engineering TongJi University Fall, 2012 Today s Topics Maximum Likelihood Estimation Bayesian Density Estimation Today s Topics Maximum Likelihood
More informationInformation collection on a graph
Information collection on a grah Ilya O. Ryzhov Warren Powell October 25, 2009 Abstract We derive a knowledge gradient olicy for an otimal learning roblem on a grah, in which we use sequential measurements
More informationParameter estimation and forecasting. Cristiano Porciani AIfA, Uni-Bonn
Parameter estimation and forecasting Cristiano Porciani AIfA, Uni-Bonn Questions? C. Porciani Estimation & forecasting 2 Temperature fluctuations Variance at multipole l (angle ~180o/l) C. Porciani Estimation
More informationNotes on Instrumental Variables Methods
Notes on Instrumental Variables Methods Michele Pellizzari IGIER-Bocconi, IZA and frdb 1 The Instrumental Variable Estimator Instrumental variable estimation is the classical solution to the roblem of
More informationSAS for Bayesian Mediation Analysis
Paer 1569-2014 SAS for Bayesian Mediation Analysis Miočević Milica, Arizona State University; David P. MacKinnon, Arizona State University ABSTRACT Recent statistical mediation analysis research focuses
More informationSolved Problems. (a) (b) (c) Figure P4.1 Simple Classification Problems First we draw a line between each set of dark and light data points.
Solved Problems Solved Problems P Solve the three simle classification roblems shown in Figure P by drawing a decision boundary Find weight and bias values that result in single-neuron ercetrons with the
More informationBasics of Inference. Lecture 21: Bayesian Inference. Review - Example - Defective Parts, cont. Review - Example - Defective Parts
Basics of Iferece Lecture 21: Sta230 / Mth230 Coli Rudel Aril 16, 2014 U util this oit i the class you have almost exclusively bee reseted with roblems where we are usig a robability model where the model
More information, given by. , I y. and I z. , are self adjoint, meaning that the adjoint of the operator is equal to the operator. This follows as A.
Further relaxation 6. ntrodution As resented so far the theory is aable of rediting the rate of transitions between energy levels i.e. it is onerned with oulations. The theory is thus erfetly aetable for
More informationMarkov Networks.
Markov Networks www.biostat.wisc.edu/~dpage/cs760/ Goals for the lecture you should understand the following concepts Markov network syntax Markov network semantics Potential functions Partition function
More informationSTA 4273H: Sta-s-cal Machine Learning
STA 4273H: Sta-s-cal Machine Learning Russ Salakhutdinov Department of Computer Science! Department of Statistical Sciences! rsalakhu@cs.toronto.edu! h0p://www.cs.utoronto.ca/~rsalakhu/ Lecture 2 In our
More informationMachine Learning. Lecture 4: Regularization and Bayesian Statistics. Feng Li. https://funglee.github.io
Machine Learning Lecture 4: Regularization and Bayesian Statistics Feng Li fli@sdu.edu.cn https://funglee.github.io School of Computer Science and Technology Shandong University Fall 207 Overfitting Problem
More informationA Simple Weight Decay Can Improve. Abstract. It has been observed in numerical simulations that a weight decay can improve
In Advances in Neural Information Processing Systems 4, J.E. Moody, S.J. Hanson and R.P. Limann, eds. Morgan Kaumann Publishers, San Mateo CA, 1995,. 950{957. A Simle Weight Decay Can Imrove Generalization
More informationSTA414/2104 Statistical Methods for Machine Learning II
STA414/2104 Statistical Methods for Machine Learning II Murat A. Erdogdu & David Duvenaud Department of Computer Science Department of Statistical Sciences Lecture 3 Slide credits: Russ Salakhutdinov Announcements
More informationSimple Cyclic loading model based on Modified Cam Clay
Simle li loading model based on Modified am la Imlemented in RISP main rogram version 00. and higher B Amir Rahim, The RISP onsortium Ltd Introdution This reort resents a simle soil model whih rovides
More informationChapter 8 Hypothesis Testing
Leture 5 for BST 63: Statistial Theory II Kui Zhang, Spring Chapter 8 Hypothesis Testing Setion 8 Introdution Definition 8 A hypothesis is a statement about a population parameter Definition 8 The two
More informationBayesian Inference and MCMC
Bayesian Inference and MCMC Aryan Arbabi Partly based on MCMC slides from CSC412 Fall 2018 1 / 18 Bayesian Inference - Motivation Consider we have a data set D = {x 1,..., x n }. E.g each x i can be the
More informationPrincipal Components Analysis and Unsupervised Hebbian Learning
Princial Comonents Analysis and Unsuervised Hebbian Learning Robert Jacobs Deartment of Brain & Cognitive Sciences University of Rochester Rochester, NY 1467, USA August 8, 008 Reference: Much of the material
More informationReal-time Hand Tracking Using a Sum of Anisotropic Gaussians Model
Real-time Hand Traking Using a Sum of Anisotroi Gaussians Model Srinath Sridhar 1, Helge Rhodin 1, Hans-Peter Seidel 1, Antti Oulasvirta 2, Christian Theobalt 1 1 Max Plank Institute for Informatis Saarbrüken,
More informationResearch Note REGRESSION ANALYSIS IN MARKOV CHAIN * A. Y. ALAMUTI AND M. R. MESHKANI **
Iranian Journal of Science & Technology, Transaction A, Vol 3, No A3 Printed in The Islamic Reublic of Iran, 26 Shiraz University Research Note REGRESSION ANALYSIS IN MARKOV HAIN * A Y ALAMUTI AND M R
More informationMachine Learning! in just a few minutes. Jan Peters Gerhard Neumann
Machine Learning! in just a few minutes Jan Peters Gerhard Neumann 1 Purpose of this Lecture Foundations of machine learning tools for robotics We focus on regression methods and general principles Often
More informationInformation collection on a graph
Information collection on a grah Ilya O. Ryzhov Warren Powell February 10, 2010 Abstract We derive a knowledge gradient olicy for an otimal learning roblem on a grah, in which we use sequential measurements
More informationNonparametric Bayesian Methods (Gaussian Processes)
[70240413 Statistical Machine Learning, Spring, 2015] Nonparametric Bayesian Methods (Gaussian Processes) Jun Zhu dcszj@mail.tsinghua.edu.cn http://bigml.cs.tsinghua.edu.cn/~jun State Key Lab of Intelligent
More informationDetection and Estimation Theory
ESE 524 Detetion and Estimation heory Joseh A. O Sullivan Samuel C. Sahs Professor Eletroni Systems and Signals Researh Laboratory Eletrial and Systems Engineering Washington University 2 Urbauer Hall
More informationMachine Learning CSE546 Carlos Guestrin University of Washington. September 30, What about continuous variables?
Linear Regression Machine Learning CSE546 Carlos Guestrin University of Washington September 30, 2014 1 What about continuous variables? n Billionaire says: If I am measuring a continuous variable, what
More informationMachine Learning CSE546 Sham Kakade University of Washington. Oct 4, What about continuous variables?
Linear Regression Machine Learning CSE546 Sham Kakade University of Washington Oct 4, 2016 1 What about continuous variables? Billionaire says: If I am measuring a continuous variable, what can you do
More informationLecture 2. G. Cowan Lectures on Statistical Data Analysis Lecture 2 page 1
Lecture 2 1 Probability (90 min.) Definition, Bayes theorem, probability densities and their properties, catalogue of pdfs, Monte Carlo 2 Statistical tests (90 min.) general concepts, test statistics,
More informationNaïve Bayes for Text Classification
Naïve Bayes for Tet Classifiation adapted by Lyle Ungar from slides by Mith Marus, whih were adapted from slides by Massimo Poesio, whih were adapted from slides by Chris Manning : Eample: Is this spam?
More informationStatistical learning. Chapter 20, Sections 1 4 1
Statistical learning Chapter 20, Sections 1 4 Chapter 20, Sections 1 4 1 Outline Bayesian learning Maximum a posteriori and maximum likelihood learning Bayes net learning ML parameter learning with complete
More informationFundamental Theorem of Calculus
Chater 6 Fundamental Theorem of Calulus 6. Definition (Nie funtions.) I will say that a real valued funtion f defined on an interval [a, b] is a nie funtion on [a, b], if f is ontinuous on [a, b] and integrable
More informationMaximum Entropy and Exponential Families
Maximum Entropy and Exponential Families April 9, 209 Abstrat The goal of this note is to derive the exponential form of probability distribution from more basi onsiderations, in partiular Entropy. It
More informationConvolutional Codes. Lecture 13. Figure 93: Encoder for rate 1/2 constraint length 3 convolutional code.
Convolutional Codes Goals Lecture Be able to encode using a convolutional code Be able to decode a convolutional code received over a binary symmetric channel or an additive white Gaussian channel Convolutional
More information10.5 Unsupervised Bayesian Learning
The Bayes Classifier Maximum-likelihood methods: Li Yu Hongda Mao Joan Wang parameter vetor is a fixed but unknown value Bayes methods: parameter vetor is a random variable with known prior distribution
More informationProbabilistic Graphical Models
Probabilisti Graphial Models David Sontag New York University Leture 12, April 19, 2012 Aknowledgement: Partially based on slides by Eri Xing at CMU and Andrew MCallum at UMass Amherst David Sontag (NYU)
More informationLecture 3. Linear Regression II Bastian Leibe RWTH Aachen
Advanced Machine Learning Lecture 3 Linear Regression II 02.11.2015 Bastian Leibe RWTH Aachen http://www.vision.rwth-aachen.de/ leibe@vision.rwth-aachen.de This Lecture: Advanced Machine Learning Regression
More informationMachine Learning Applied to Alliance Networks
Mahine Learning Alied to Alliane Networks Telmo Menezes (telmo@telmomenezes.om) CAMS-EHESS / CNRS Klaus Hamberger LAS / EHESS / CNRS Camille Roth CAMS-EHESS / ISC-PIF / CNRS The Problem How to disover
More informationBayesian rules of probability as principles of logic [Cox] Notation: pr(x I) is the probability (or pdf) of x being true given information I
Bayesian rules of probability as principles of logic [Cox] Notation: pr(x I) is the probability (or pdf) of x being true given information I 1 Sum rule: If set {x i } is exhaustive and exclusive, pr(x
More informationFast, Approximately Optimal Solutions for Single and Dynamic MRFs
Fast, Aroximately Otimal Solutions for Single and Dynami MRFs Nikos Komodakis, Georgios Tziritas University of Crete, Comuter Siene Deartment {komod,tziritas}@sd.uo.gr Nikos Paragios MAS, Eole Centrale
More informationLatent Variable Models
Latent Variable Models Stefano Ermon, Aditya Grover Stanford University Lecture 5 Stefano Ermon, Aditya Grover (AI Lab) Deep Generative Models Lecture 5 1 / 31 Recap of last lecture 1 Autoregressive models:
More informationCSC 2541: Bayesian Methods for Machine Learning
CSC 2541: Bayesian Methods for Machine Learning Radford M. Neal, University of Toronto, 2011 Lecture 10 Alternatives to Monte Carlo Computation Since about 1990, Markov chain Monte Carlo has been the dominant
More informationx 2 a mod m. has a solution. Theorem 13.2 (Euler s Criterion). Let p be an odd prime. The congruence x 2 1 mod p,
13. Quadratic Residues We now turn to the question of when a quadratic equation has a solution modulo m. The general quadratic equation looks like ax + bx + c 0 mod m. Assuming that m is odd or that b
More informationECE295, Data Assimila0on and Inverse Problems, Spring 2015
ECE295, Data Assimila0on and Inverse Problems, Spring 2015 1 April, Intro; Linear discrete Inverse problems (Aster Ch 1 and 2) Slides 8 April, SVD (Aster ch 2 and 3) Slides 15 April, RegularizaFon (ch
More informationUSING GENETIC ALGORITHMS FOR OPTIMIZATION OF TURNING MACHINING PROCESS
Journal of Engineering Studies and Researh Volume 19 (2013) No. 1 47 USING GENETIC ALGORITHMS FOR OPTIMIZATION OF TURNING MACHINING PROCESS DUSAN PETKOVIC 1, MIROSLAV RADOVANOVIC 1 1 University of Nis,
More informationEXTENDED MATRIX CUBE THEOREMS WITH APPLICATIONS TO -THEORY IN CONTROL
MATHEMATICS OF OPERATIONS RESEARCH Vol. 28, No. 3, August 2003,. 497 523 Printed in U.S.A. EXTENDED MATRIX CUBE THEOREMS WITH APPLICATIONS TO -THEORY IN CONTROL AHARON BEN-TAL, ARKADI NEMIROVSKI, and CORNELIS
More informationApproximating min-max k-clustering
Aroximating min-max k-clustering Asaf Levin July 24, 2007 Abstract We consider the roblems of set artitioning into k clusters with minimum total cost and minimum of the maximum cost of a cluster. The cost
More informationPlanar Undulator Considerations
LCC-0085 July 2002 Linear Collider Collaboration Teh Notes Planar Undulator Considerations John C. Sheard Stanford Linear Aelerator Center Stanford University Menlo Park, California Abstrat: This note
More informationDevelopment of Stochastic Artificial Neural Networks for Hydrological Prediction
Development of Stochastic Artificial Neural Networks for Hydrological Prediction G. B. Kingston, M. F. Lambert and H. R. Maier Centre for Applied Modelling in Water Engineering, School of Civil and Environmental
More informationarxiv: v1 [physics.data-an] 26 Oct 2012
Constraints on Yield Parameters in Extended Maximum Likelihood Fits Till Moritz Karbach a, Maximilian Schlu b a TU Dortmund, Germany, moritz.karbach@cern.ch b TU Dortmund, Germany, maximilian.schlu@cern.ch
More informationECE 534 Information Theory - Midterm 2
ECE 534 Information Theory - Midterm Nov.4, 009. 3:30-4:45 in LH03. You will be given the full class time: 75 minutes. Use it wisely! Many of the roblems have short answers; try to find shortcuts. You
More information2. Probabilistic Ontology Model (POM)
Sring 2010 LTI Reort Probabilti Ontology Model Ni Lao 1. Introdution Th a ritique or Tom M. Mithell s talk about the Read the Web rojet. In the talk, Tom desribed the researh to develo a never-ending language
More informationLecture 18: Learning probabilistic models
Lecture 8: Learning probabilistic models Roger Grosse Overview In the first half of the course, we introduced backpropagation, a technique we used to train neural nets to minimize a variety of cost functions.
More information