CSC321: 2011 Introduction to Neural Networks and Machine Learning. Lecture 11: Bayesian learning continued. Geoffrey Hinton

Size: px
Start display at page:

Download "CSC321: 2011 Introduction to Neural Networks and Machine Learning. Lecture 11: Bayesian learning continued. Geoffrey Hinton"

Transcription

1 CSC31: 011 Introdution to Neural Networks and Mahine Learning Leture 11: Bayesian learning ontinued Geoffrey Hinton

2 Bayes Theorem, Prior robability of weight vetor Posterior robability of weight vetor given training data Probability of observed data given joint robability onditional robability Cost

3 Maximum A Posteriori Learning This trades-off the rior robabilities of the arameters against the robability of the data given the arameters. It looks for the arameters that have the greatest rodut of the rior term and the likelihood term. Minimizing the squared weights is equivalent to maximizing the robability of the weights under a zero-mean Gaussian rior. w w 0 w 1 w e w w k

4 The Bayesian interretation of weight deay i i i i w E C w d y C 1 1 * assuming a Gaussian rior for the weights assuming that the model makes a Gaussian redition onstant So the orret value of the weight deay arameter is the ratio of two varianes. Its not just an arbitrary hak.

5 Estimating the variane of the outut noise After we have learned a model that minimizes the squared error, we an find the best value for the outut noise. The best value is the one that maximizes the robability of roduing exatly the orret answers after adding Gaussian noise to the outut rodued by the neural net. The best value is found by simly using the variane of the residual errors.

6 Estimating the variane of the Gaussian rior on the weights After learning a model with some initial hoie of variane for the weight rior, we ould do a dirty trik alled emirial Bayes. Set the variane of the Gaussian rior to be whatever makes the weights that the model learned most likely. This is done by simly fitting a zero-mean Gaussian to the one-dimensional distribution of the learned weight values.

7 MaKay s quik and dirty method of hoosing the ratio of the noise variane to the weight rior variane. Start with guesses for both the noise variane and the weight rior variane o some learning Reset the noise variane to fit the residual errors Reset the weight rior varaine to fit the atual learned weights. Reeat until bored.

8 Full Bayesian Learning Instead of trying to find the best single setting of the arameters as in ML or MAP omute the full osterior distribution over arameter settings This is extremely omutationally intensive for all but the simlest models its feasible for a biased oin. To make reditions, let eah different setting of the arameters make its own redition and then ombine all these reditions by weighting eah of them by the osterior robability of that setting of the arameters. This is also omutationally intensive. The full Bayesian aroah allows us to use omliated models even when we do not have muh data

9 Overfitting: A frequentist illusion? If you do not have muh data, you should use a simle model, beause a omlex one will overfit. This is true. But only if you assume that fitting a model means hoosing a single best setting of the arameters. If you use the full osterior over arameter settings, overfitting disaears! ith little data, you get very vague reditions beause many different arameters settings have signifiant osterior robability

10 A lassi examle of overfitting hih model do you believe? The omliated model fits the data better. But it is not eonomial and it makes silly reditions. But what if we start with a reasonable rior over all fifth-order olynomials and use the full osterior distribution. Now we get vague and sensible reditions. There is no reason why the amount of data should influene our rior beliefs about the omlexity of the model.

11 Aroximating full Bayesian learning in a neural network If the neural net only has a few arameters we ould ut a grid over the arameter sae and evaluate at eah grid-oint. This is exensive, but it does not involve any gradient desent and there are no loal otimum issues. After evaluating eah grid oint we use all of them to make reditions on test data This is also exensive, but it works muh better than ML learning when the osterior is vague or multimodal this haens when data is sare. dtest inut test g dtest inut test, g g grid

12 An examle of full Bayesian learning Allow eah of the 6 weights or biases to have the 9 ossible values [- : 0.5 : ] So there are 9^6 grid-oints in arameter sae. For eah grid-oint omute the robability of the observed oututs of all the training ases. This is the likelihood term and is exlained on the next slide Multily the rior for eah grid-oint by the likelihood term and renormalize to get the osterior robability for eah grid-oint. Make reditions by using the osterior robabilities to average the reditions made by the different grid-oints. bias bias A neural net with inuts, 1 outut and 6 arameters

13 Comuting the likelihood term for a isti outut unit The outut of the isti unit is the robability that the network assigns to the answer 1. It assigns the omlementary robability to the answer 0. y f inut, g if d=1 if d=0 outut d inut, g d y 1 d 1 y all training oututs g outut d inut, g

14 hat an we do if there are too many arameters for a grid to be feasible? The number of grid oints is exonential in the number of arameters. So we annot deal with more than a few arameters using a grid. If there is enough data to make most arameter vetors very unlikely, only a tiny fration of the grid oints make a signifiant ontribution to the reditions. Maybe we an just evaluate this tiny fration It might be good enough to just samle weight vetors aording to their osterior robabilities. ytest inut test, i ytest inut test, i i Samle weight vetors with this robability

15 One method for samling weight vetors In standard bakroagation we kee moving the weights in the diretion that dereases the ost i.e. the diretion that inreases the likelihood lus the rior, summed over all training ases. Suose we add some Gaussian noise to the weight vetor after eah udate. So the weight vetor never settles down. It kees wandering around, but it tends to refer low ost regions of the weight sae. Amazing fat: If we use just the right amount of noise, and if we let the weight vetor wander around for long enough before we take a samle, we will get a samle from the true osterior over weight vetors. This is alled a Markov Chain Monte Carlo method and it makes it feasible to use full Bayesian learning with hundreds or thousands of arameters. There are related MCMC methods that are more omliated but more effiient we don t need to let the weights wander around for so long before we get samles from the osterior.

CSC321: 2011 Introduction to Neural Networks and Machine Learning. Lecture 10: The Bayesian way to fit models. Geoffrey Hinton

CSC321: 2011 Introduction to Neural Networks and Machine Learning. Lecture 10: The Bayesian way to fit models. Geoffrey Hinton CSC31: 011 Introdution to Neural Networks and Mahine Learning Leture 10: The Bayesian way to fit models Geoffrey Hinton The Bayesian framework The Bayesian framework assumes that we always have a rior

More information

STA 250: Statistics. Notes 7. Bayesian Approach to Statistics. Book chapters: 7.2

STA 250: Statistics. Notes 7. Bayesian Approach to Statistics. Book chapters: 7.2 STA 25: Statistics Notes 7. Bayesian Aroach to Statistics Book chaters: 7.2 1 From calibrating a rocedure to quantifying uncertainty We saw that the central idea of classical testing is to rovide a rigorous

More information

Ways to make neural networks generalize better

Ways to make neural networks generalize better Ways to make neural networks generalize better Seminar in Deep Learning University of Tartu 04 / 10 / 2014 Pihel Saatmann Topics Overview of ways to improve generalization Limiting the size of the weights

More information

INFORMATION TRANSFER THROUGH CLASSIFIERS AND ITS RELATION TO PROBABILITY OF ERROR

INFORMATION TRANSFER THROUGH CLASSIFIERS AND ITS RELATION TO PROBABILITY OF ERROR IFORMATIO TRAFER TROUG CLAIFIER AD IT RELATIO TO PROBABILITY OF ERROR Deniz Erdogmus, Jose C. Prini Comutational euroengineering Lab (CEL, University of Florida, Gainesville, FL 6 [deniz,rini]@nel.ufl.edu

More information

Bayesian Inference and an Intro to Monte Carlo Methods

Bayesian Inference and an Intro to Monte Carlo Methods Bayesian Inference and an Intro to Monte Carlo Methods Slides from Geoffrey Hinton and Iain Murray CSC411: Machine Learning and Data Mining, Winter 2017 Michael Guerzhoy 1 Reminder: Bayesian Inference

More information

Introduction to Probability for Graphical Models

Introduction to Probability for Graphical Models Introduction to Probability for Grahical Models CSC 4 Kaustav Kundu Thursday January 4, 06 *Most slides based on Kevin Swersky s slides, Inmar Givoni s slides, Danny Tarlow s slides, Jaser Snoek s slides,

More information

Subject: Modeling of Thermal Rocket Engines; Nozzle flow; Control of mass flow. p c. Thrust Chamber mixing and combustion

Subject: Modeling of Thermal Rocket Engines; Nozzle flow; Control of mass flow. p c. Thrust Chamber mixing and combustion 16.50 Leture 6 Subjet: Modeling of Thermal Roket Engines; Nozzle flow; Control of mass flow Though onetually simle, a roket engine is in fat hysially a very omlex devie and diffiult to reresent quantitatively

More information

Learning Sequence Motif Models Using Gibbs Sampling

Learning Sequence Motif Models Using Gibbs Sampling Learning Sequence Motif Models Using Gibbs Samling BMI/CS 776 www.biostat.wisc.edu/bmi776/ Sring 2018 Anthony Gitter gitter@biostat.wisc.edu These slides excluding third-arty material are licensed under

More information

Risk Analysis in Water Quality Problems. Souza, Raimundo 1 Chagas, Patrícia 2 1,2 Departamento de Engenharia Hidráulica e Ambiental

Risk Analysis in Water Quality Problems. Souza, Raimundo 1 Chagas, Patrícia 2 1,2 Departamento de Engenharia Hidráulica e Ambiental Risk Analysis in Water Quality Problems. Downloaded from aselibrary.org by Uf - Universidade Federal Do Ceara on 1/29/14. Coyright ASCE. For ersonal use only; all rights reserved. Souza, Raimundo 1 Chagas,

More information

4. Score normalization technical details We now discuss the technical details of the score normalization method.

4. Score normalization technical details We now discuss the technical details of the score normalization method. SMT SCORING SYSTEM This document describes the scoring system for the Stanford Math Tournament We begin by giving an overview of the changes to scoring and a non-technical descrition of the scoring rules

More information

Methods of evaluating tests

Methods of evaluating tests Methods of evaluating tests Let X,, 1 Xn be i.i.d. Bernoulli( p ). Then 5 j= 1 j ( 5, ) T = X Binomial p. We test 1 H : p vs. 1 1 H : p>. We saw that a LRT is 1 if t k* φ ( x ) =. otherwise (t is the observed

More information

Panos Kouvelis Olin School of Business Washington University

Panos Kouvelis Olin School of Business Washington University Quality-Based Cometition, Profitability, and Variable Costs Chester Chambers Co Shool of Business Dallas, TX 7575 hamber@mailosmuedu -768-35 Panos Kouvelis Olin Shool of Business Washington University

More information

Bayesian Networks Practice

Bayesian Networks Practice Bayesian Networks Practice Part 2 2016-03-17 Byoung-Hee Kim, Seong-Ho Son Biointelligence Lab, CSE, Seoul National University Agenda Probabilistic Inference in Bayesian networks Probability basics D-searation

More information

BAYES CLASSIFIER. Ivan Michael Siregar APLYSIT IT SOLUTION CENTER. Jl. Ir. H. Djuanda 109 Bandung

BAYES CLASSIFIER. Ivan Michael Siregar APLYSIT IT SOLUTION CENTER. Jl. Ir. H. Djuanda 109 Bandung BAYES CLASSIFIER www.aplysit.om www.ivan.siregar.biz ALYSIT IT SOLUTION CENTER Jl. Ir. H. Duanda 109 Bandung Ivan Mihael Siregar ivan.siregar@gmail.om Data Mining 2010 Bayesian Method Our fous this leture

More information

Proportional-Integral-Derivative PID Controls

Proportional-Integral-Derivative PID Controls Proortional-Integral-Derivative PID Controls Dr M.J. Willis Det. of Chemial and Proess Engineering University of Newastle e-mail: mark.willis@nl.a.uk Written: 7 th November, 998 Udated: 6 th Otober, 999

More information

CSC2515 Winter 2015 Introduc3on to Machine Learning. Lecture 5: Clustering, mixture models, and EM

CSC2515 Winter 2015 Introduc3on to Machine Learning. Lecture 5: Clustering, mixture models, and EM CSC2515 Winter 2015 Introdu3on to Mahine Learning Leture 5: Clustering, mixture models, and EM All leture slides will be available as.pdf on the ourse website: http://www.s.toronto.edu/~urtasun/ourses/csc2515/

More information

EE451/551: Digital Control. Relationship Between s and z Planes. The Relationship Between s and z Planes 11/10/2011

EE451/551: Digital Control. Relationship Between s and z Planes. The Relationship Between s and z Planes 11/10/2011 /0/0 EE45/55: Digital Control Chater 6: Digital Control System Design he Relationshi Between s and Planes As noted reviously: s j e e e e r s j where r e and If an analog system has oles at: s n jn a jd

More information

Complexity of Regularization RBF Networks

Complexity of Regularization RBF Networks Complexity of Regularization RBF Networks Mark A Kon Department of Mathematis and Statistis Boston University Boston, MA 02215 mkon@buedu Leszek Plaskota Institute of Applied Mathematis University of Warsaw

More information

Lecture 23 Maximum Likelihood Estimation and Bayesian Inference

Lecture 23 Maximum Likelihood Estimation and Bayesian Inference Lecture 23 Maximum Likelihood Estimation and Bayesian Inference Thais Paiva STA 111 - Summer 2013 Term II August 7, 2013 1 / 31 Thais Paiva STA 111 - Summer 2013 Term II Lecture 23, 08/07/2013 Lecture

More information

18.05 Problem Set 6, Spring 2014 Solutions

18.05 Problem Set 6, Spring 2014 Solutions 8.5 Problem Set 6, Spring 4 Solutions Problem. pts.) a) Throughout this problem we will let x be the data of 4 heads out of 5 tosses. We have 4/5 =.56. Computing the likelihoods: 5 5 px H )=.5) 5 px H

More information

Bayesian Networks Practice

Bayesian Networks Practice ayesian Networks Practice Part 2 2016-03-17 young-hee Kim Seong-Ho Son iointelligence ab CSE Seoul National University Agenda Probabilistic Inference in ayesian networks Probability basics D-searation

More information

EconS 503 Homework #8. Answer Key

EconS 503 Homework #8. Answer Key EonS 503 Homework #8 Answer Key Exerise #1 Damaged good strategy (Menu riing) 1. It is immediate that otimal rie is = 3 whih yields rofits of ππ = 3/ (the alternative being a rie of = 1, yielding ππ =

More information

2.2 BUDGET-CONSTRAINED CHOICE WITH TWO COMMODITIES

2.2 BUDGET-CONSTRAINED CHOICE WITH TWO COMMODITIES Essential Miroeonomis -- 22 BUDGET-CONSTRAINED CHOICE WITH TWO COMMODITIES Continuity of demand 2 Inome effets 6 Quasi-linear, Cobb-Douglas and CES referenes 9 Eenditure funtion 4 Substitution effets and

More information

Radial Basis Function Networks: Algorithms

Radial Basis Function Networks: Algorithms Radial Basis Function Networks: Algorithms Introduction to Neural Networks : Lecture 13 John A. Bullinaria, 2004 1. The RBF Maing 2. The RBF Network Architecture 3. Comutational Power of RBF Networks 4.

More information

AI*IA 2003 Fusion of Multiple Pattern Classifiers PART III

AI*IA 2003 Fusion of Multiple Pattern Classifiers PART III AI*IA 23 Fusion of Multile Pattern Classifiers PART III AI*IA 23 Tutorial on Fusion of Multile Pattern Classifiers by F. Roli 49 Methods for fusing multile classifiers Methods for fusing multile classifiers

More information

Overview. Probabilistic Interpretation of Linear Regression Maximum Likelihood Estimation Bayesian Estimation MAP Estimation

Overview. Probabilistic Interpretation of Linear Regression Maximum Likelihood Estimation Bayesian Estimation MAP Estimation Overview Probabilistic Interpretation of Linear Regression Maximum Likelihood Estimation Bayesian Estimation MAP Estimation Probabilistic Interpretation: Linear Regression Assume output y is generated

More information

CHAPTER 16. Basic Concepts. Basic Concepts. The Equilibrium Constant. Reaction Quotient & Equilibrium Constant. Chemical Equilibrium

CHAPTER 16. Basic Concepts. Basic Concepts. The Equilibrium Constant. Reaction Quotient & Equilibrium Constant. Chemical Equilibrium Proerties of an Equilibrium System CHAPTER 6 Chemial Equilibrium Equilibrium systems are DYNAMIC (in onstant motion) REVERSIBLE an be aroahed from either diretion Pink to blue Co(H O) 6 Cl ---> > Co(H

More information

Modeling Data with Linear Combinations of Basis Functions. Read Chapter 3 in the text by Bishop

Modeling Data with Linear Combinations of Basis Functions. Read Chapter 3 in the text by Bishop Modeling Data with Linear Combinations of Basis Functions Read Chapter 3 in the text by Bishop A Type of Supervised Learning Problem We want to model data (x 1, t 1 ),..., (x N, t N ), where x i is a vector

More information

CS 687 Jana Kosecka. Uncertainty, Bayesian Networks Chapter 13, Russell and Norvig Chapter 14,

CS 687 Jana Kosecka. Uncertainty, Bayesian Networks Chapter 13, Russell and Norvig Chapter 14, CS 687 Jana Koseka Unertainty Bayesian Networks Chapter 13 Russell and Norvig Chapter 14 14.1-14.3 Outline Unertainty robability Syntax and Semantis Inferene Independene and Bayes' Rule Syntax Basi element:

More information

Named Entity Recognition using Maximum Entropy Model SEEM5680

Named Entity Recognition using Maximum Entropy Model SEEM5680 Named Entity Recognition using Maximum Entroy Model SEEM5680 Named Entity Recognition System Named Entity Recognition (NER): Identifying certain hrases/word sequences in a free text. Generally it involves

More information

Lecture 3a: The Origin of Variational Bayes

Lecture 3a: The Origin of Variational Bayes CSC535: 013 Advanced Machine Learning Lecture 3a: The Origin of Variational Bayes Geoffrey Hinton The origin of variational Bayes In variational Bayes, e approximate the true posterior across parameters

More information

Bayesian classification CISC 5800 Professor Daniel Leeds

Bayesian classification CISC 5800 Professor Daniel Leeds Bayesian classification CISC 5800 Professor Daniel Leeds Classifying with robabilities Examle goal: Determine is it cloudy out Available data: Light detector: x 0,25 Potential class (atmosheric states):

More information

Feedback-error control

Feedback-error control Chater 4 Feedback-error control 4.1 Introduction This chater exlains the feedback-error (FBE) control scheme originally described by Kawato [, 87, 8]. FBE is a widely used neural network based controller

More information

Approximate inference in Energy-Based Models

Approximate inference in Energy-Based Models CSC 2535: 2013 Lecture 3b Approximate inference in Energy-Based Models Geoffrey Hinton Two types of density model Stochastic generative model using directed acyclic graph (e.g. Bayes Net) Energy-based

More information

Chapter 3: Maximum-Likelihood & Bayesian Parameter Estimation (part 1)

Chapter 3: Maximum-Likelihood & Bayesian Parameter Estimation (part 1) HW 1 due today Parameter Estimation Biometrics CSE 190 Lecture 7 Today s lecture was on the blackboard. These slides are an alternative presentation of the material. CSE190, Winter10 CSE190, Winter10 Chapter

More information

The Poisson Regression Model

The Poisson Regression Model The Poisson Regression Model The Poisson regression model aims at modeling a counting variable Y, counting the number of times that a certain event occurs during a given time eriod. We observe a samle

More information

Machine Learning CSE546 Carlos Guestrin University of Washington. September 30, 2013

Machine Learning CSE546 Carlos Guestrin University of Washington. September 30, 2013 Bayesian Methods Machine Learning CSE546 Carlos Guestrin University of Washington September 30, 2013 1 What about prior n Billionaire says: Wait, I know that the thumbtack is close to 50-50. What can you

More information

Bayesian Model Averaging Kriging Jize Zhang and Alexandros Taflanidis

Bayesian Model Averaging Kriging Jize Zhang and Alexandros Taflanidis HIPAD LAB: HIGH PERFORMANCE SYSTEMS LABORATORY DEPARTMENT OF CIVIL AND ENVIRONMENTAL ENGINEERING AND EARTH SCIENCES Bayesian Model Averaging Kriging Jize Zhang and Alexandros Taflanidis Why use metamodeling

More information

The connection of dropout and Bayesian statistics

The connection of dropout and Bayesian statistics The connection of dropout and Bayesian statistics Interpretation of dropout as approximate Bayesian modelling of NN http://mlg.eng.cam.ac.uk/yarin/thesis/thesis.pdf Dropout Geoffrey Hinton Google, University

More information

silicon wafers p b θ b IR heater θ h p h solution bath cleaning filter bellows pump

silicon wafers p b θ b IR heater θ h p h solution bath cleaning filter bellows pump An Analysis of Cometitive Assoiative Net for Temerature Control of RCA Cleaning Solutions S Kurogi y, H Nobutomo y, T Nishida y, H Sakamoto y, Y Fuhikawa y, M Mimata z and K Itoh z ykyushu Institute of

More information

Density Estimation. Seungjin Choi

Density Estimation. Seungjin Choi Density Estimation Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr http://mlg.postech.ac.kr/

More information

Learning Energy-Based Models of High-Dimensional Data

Learning Energy-Based Models of High-Dimensional Data Learning Energy-Based Models of High-Dimensional Data Geoffrey Hinton Max Welling Yee-Whye Teh Simon Osindero www.cs.toronto.edu/~hinton/energybasedmodelsweb.htm Discovering causal structure as a goal

More information

arxiv: v2 [cs.ai] 16 Feb 2016

arxiv: v2 [cs.ai] 16 Feb 2016 Cells in Multidimensional Reurrent Neural Networks arxiv:42.2620v2 [s.ai] 6 Feb 206 Gundram Leifert Tobias Sauß Tobias Grüning Welf Wustlih Roger Labahn University of Rostok Institute of Mathematis 805

More information

Bayesian Models in Machine Learning

Bayesian Models in Machine Learning Bayesian Models in Machine Learning Lukáš Burget Escuela de Ciencias Informáticas 2017 Buenos Aires, July 24-29 2017 Frequentist vs. Bayesian Frequentist point of view: Probability is the frequency of

More information

CS434a/541a: Pattern Recognition Prof. Olga Veksler. Lecture 3

CS434a/541a: Pattern Recognition Prof. Olga Veksler. Lecture 3 CS434a/541a: attern Recognition rof. Olga Veksler Lecture 3 1 Announcements Link to error data in the book Reading assignment Assignment 1 handed out, due Oct. 4 lease send me an email with your name and

More information

Lecture 7: Linear Classification Methods

Lecture 7: Linear Classification Methods Homeork Homeork Lecture 7: Linear lassification Methods Final rojects? Grous oics Proosal eek 5 Lecture is oster session, Jacobs Hall Lobby, snacks Final reort 5 June. What is linear classification? lassification

More information

CSC321 Lecture 18: Learning Probabilistic Models

CSC321 Lecture 18: Learning Probabilistic Models CSC321 Lecture 18: Learning Probabilistic Models Roger Grosse Roger Grosse CSC321 Lecture 18: Learning Probabilistic Models 1 / 25 Overview So far in this course: mainly supervised learning Language modeling

More information

Lecture : Probabilistic Machine Learning

Lecture : Probabilistic Machine Learning Lecture : Probabilistic Machine Learning Riashat Islam Reasoning and Learning Lab McGill University September 11, 2018 ML : Many Methods with Many Links Modelling Views of Machine Learning Machine Learning

More information

Any AND-OR formula of size N can be evaluated in time N 1/2+o(1) on a quantum computer

Any AND-OR formula of size N can be evaluated in time N 1/2+o(1) on a quantum computer Any AND-OR formula of size N an be evaluated in time N /2+o( on a quantum omuter Andris Ambainis, ambainis@math.uwaterloo.a Robert Šalek salek@ees.berkeley.edu Andrew M. Childs, amhilds@uwaterloo.a Shengyu

More information

Lecture 5. G. Cowan Lectures on Statistical Data Analysis Lecture 5 page 1

Lecture 5. G. Cowan Lectures on Statistical Data Analysis Lecture 5 page 1 Lecture 5 1 Probability (90 min.) Definition, Bayes theorem, probability densities and their properties, catalogue of pdfs, Monte Carlo 2 Statistical tests (90 min.) general concepts, test statistics,

More information

Parametric Models. Dr. Shuang LIANG. School of Software Engineering TongJi University Fall, 2012

Parametric Models. Dr. Shuang LIANG. School of Software Engineering TongJi University Fall, 2012 Parametric Models Dr. Shuang LIANG School of Software Engineering TongJi University Fall, 2012 Today s Topics Maximum Likelihood Estimation Bayesian Density Estimation Today s Topics Maximum Likelihood

More information

Information collection on a graph

Information collection on a graph Information collection on a grah Ilya O. Ryzhov Warren Powell October 25, 2009 Abstract We derive a knowledge gradient olicy for an otimal learning roblem on a grah, in which we use sequential measurements

More information

Parameter estimation and forecasting. Cristiano Porciani AIfA, Uni-Bonn

Parameter estimation and forecasting. Cristiano Porciani AIfA, Uni-Bonn Parameter estimation and forecasting Cristiano Porciani AIfA, Uni-Bonn Questions? C. Porciani Estimation & forecasting 2 Temperature fluctuations Variance at multipole l (angle ~180o/l) C. Porciani Estimation

More information

Notes on Instrumental Variables Methods

Notes on Instrumental Variables Methods Notes on Instrumental Variables Methods Michele Pellizzari IGIER-Bocconi, IZA and frdb 1 The Instrumental Variable Estimator Instrumental variable estimation is the classical solution to the roblem of

More information

SAS for Bayesian Mediation Analysis

SAS for Bayesian Mediation Analysis Paer 1569-2014 SAS for Bayesian Mediation Analysis Miočević Milica, Arizona State University; David P. MacKinnon, Arizona State University ABSTRACT Recent statistical mediation analysis research focuses

More information

Solved Problems. (a) (b) (c) Figure P4.1 Simple Classification Problems First we draw a line between each set of dark and light data points.

Solved Problems. (a) (b) (c) Figure P4.1 Simple Classification Problems First we draw a line between each set of dark and light data points. Solved Problems Solved Problems P Solve the three simle classification roblems shown in Figure P by drawing a decision boundary Find weight and bias values that result in single-neuron ercetrons with the

More information

Basics of Inference. Lecture 21: Bayesian Inference. Review - Example - Defective Parts, cont. Review - Example - Defective Parts

Basics of Inference. Lecture 21: Bayesian Inference. Review - Example - Defective Parts, cont. Review - Example - Defective Parts Basics of Iferece Lecture 21: Sta230 / Mth230 Coli Rudel Aril 16, 2014 U util this oit i the class you have almost exclusively bee reseted with roblems where we are usig a robability model where the model

More information

, given by. , I y. and I z. , are self adjoint, meaning that the adjoint of the operator is equal to the operator. This follows as A.

, given by. , I y. and I z. , are self adjoint, meaning that the adjoint of the operator is equal to the operator. This follows as A. Further relaxation 6. ntrodution As resented so far the theory is aable of rediting the rate of transitions between energy levels i.e. it is onerned with oulations. The theory is thus erfetly aetable for

More information

Markov Networks.

Markov Networks. Markov Networks www.biostat.wisc.edu/~dpage/cs760/ Goals for the lecture you should understand the following concepts Markov network syntax Markov network semantics Potential functions Partition function

More information

STA 4273H: Sta-s-cal Machine Learning

STA 4273H: Sta-s-cal Machine Learning STA 4273H: Sta-s-cal Machine Learning Russ Salakhutdinov Department of Computer Science! Department of Statistical Sciences! rsalakhu@cs.toronto.edu! h0p://www.cs.utoronto.ca/~rsalakhu/ Lecture 2 In our

More information

Machine Learning. Lecture 4: Regularization and Bayesian Statistics. Feng Li. https://funglee.github.io

Machine Learning. Lecture 4: Regularization and Bayesian Statistics. Feng Li. https://funglee.github.io Machine Learning Lecture 4: Regularization and Bayesian Statistics Feng Li fli@sdu.edu.cn https://funglee.github.io School of Computer Science and Technology Shandong University Fall 207 Overfitting Problem

More information

A Simple Weight Decay Can Improve. Abstract. It has been observed in numerical simulations that a weight decay can improve

A Simple Weight Decay Can Improve. Abstract. It has been observed in numerical simulations that a weight decay can improve In Advances in Neural Information Processing Systems 4, J.E. Moody, S.J. Hanson and R.P. Limann, eds. Morgan Kaumann Publishers, San Mateo CA, 1995,. 950{957. A Simle Weight Decay Can Imrove Generalization

More information

STA414/2104 Statistical Methods for Machine Learning II

STA414/2104 Statistical Methods for Machine Learning II STA414/2104 Statistical Methods for Machine Learning II Murat A. Erdogdu & David Duvenaud Department of Computer Science Department of Statistical Sciences Lecture 3 Slide credits: Russ Salakhutdinov Announcements

More information

Simple Cyclic loading model based on Modified Cam Clay

Simple Cyclic loading model based on Modified Cam Clay Simle li loading model based on Modified am la Imlemented in RISP main rogram version 00. and higher B Amir Rahim, The RISP onsortium Ltd Introdution This reort resents a simle soil model whih rovides

More information

Chapter 8 Hypothesis Testing

Chapter 8 Hypothesis Testing Leture 5 for BST 63: Statistial Theory II Kui Zhang, Spring Chapter 8 Hypothesis Testing Setion 8 Introdution Definition 8 A hypothesis is a statement about a population parameter Definition 8 The two

More information

Bayesian Inference and MCMC

Bayesian Inference and MCMC Bayesian Inference and MCMC Aryan Arbabi Partly based on MCMC slides from CSC412 Fall 2018 1 / 18 Bayesian Inference - Motivation Consider we have a data set D = {x 1,..., x n }. E.g each x i can be the

More information

Principal Components Analysis and Unsupervised Hebbian Learning

Principal Components Analysis and Unsupervised Hebbian Learning Princial Comonents Analysis and Unsuervised Hebbian Learning Robert Jacobs Deartment of Brain & Cognitive Sciences University of Rochester Rochester, NY 1467, USA August 8, 008 Reference: Much of the material

More information

Real-time Hand Tracking Using a Sum of Anisotropic Gaussians Model

Real-time Hand Tracking Using a Sum of Anisotropic Gaussians Model Real-time Hand Traking Using a Sum of Anisotroi Gaussians Model Srinath Sridhar 1, Helge Rhodin 1, Hans-Peter Seidel 1, Antti Oulasvirta 2, Christian Theobalt 1 1 Max Plank Institute for Informatis Saarbrüken,

More information

Research Note REGRESSION ANALYSIS IN MARKOV CHAIN * A. Y. ALAMUTI AND M. R. MESHKANI **

Research Note REGRESSION ANALYSIS IN MARKOV CHAIN * A. Y. ALAMUTI AND M. R. MESHKANI ** Iranian Journal of Science & Technology, Transaction A, Vol 3, No A3 Printed in The Islamic Reublic of Iran, 26 Shiraz University Research Note REGRESSION ANALYSIS IN MARKOV HAIN * A Y ALAMUTI AND M R

More information

Machine Learning! in just a few minutes. Jan Peters Gerhard Neumann

Machine Learning! in just a few minutes. Jan Peters Gerhard Neumann Machine Learning! in just a few minutes Jan Peters Gerhard Neumann 1 Purpose of this Lecture Foundations of machine learning tools for robotics We focus on regression methods and general principles Often

More information

Information collection on a graph

Information collection on a graph Information collection on a grah Ilya O. Ryzhov Warren Powell February 10, 2010 Abstract We derive a knowledge gradient olicy for an otimal learning roblem on a grah, in which we use sequential measurements

More information

Nonparametric Bayesian Methods (Gaussian Processes)

Nonparametric Bayesian Methods (Gaussian Processes) [70240413 Statistical Machine Learning, Spring, 2015] Nonparametric Bayesian Methods (Gaussian Processes) Jun Zhu dcszj@mail.tsinghua.edu.cn http://bigml.cs.tsinghua.edu.cn/~jun State Key Lab of Intelligent

More information

Detection and Estimation Theory

Detection and Estimation Theory ESE 524 Detetion and Estimation heory Joseh A. O Sullivan Samuel C. Sahs Professor Eletroni Systems and Signals Researh Laboratory Eletrial and Systems Engineering Washington University 2 Urbauer Hall

More information

Machine Learning CSE546 Carlos Guestrin University of Washington. September 30, What about continuous variables?

Machine Learning CSE546 Carlos Guestrin University of Washington. September 30, What about continuous variables? Linear Regression Machine Learning CSE546 Carlos Guestrin University of Washington September 30, 2014 1 What about continuous variables? n Billionaire says: If I am measuring a continuous variable, what

More information

Machine Learning CSE546 Sham Kakade University of Washington. Oct 4, What about continuous variables?

Machine Learning CSE546 Sham Kakade University of Washington. Oct 4, What about continuous variables? Linear Regression Machine Learning CSE546 Sham Kakade University of Washington Oct 4, 2016 1 What about continuous variables? Billionaire says: If I am measuring a continuous variable, what can you do

More information

Lecture 2. G. Cowan Lectures on Statistical Data Analysis Lecture 2 page 1

Lecture 2. G. Cowan Lectures on Statistical Data Analysis Lecture 2 page 1 Lecture 2 1 Probability (90 min.) Definition, Bayes theorem, probability densities and their properties, catalogue of pdfs, Monte Carlo 2 Statistical tests (90 min.) general concepts, test statistics,

More information

Naïve Bayes for Text Classification

Naïve Bayes for Text Classification Naïve Bayes for Tet Classifiation adapted by Lyle Ungar from slides by Mith Marus, whih were adapted from slides by Massimo Poesio, whih were adapted from slides by Chris Manning : Eample: Is this spam?

More information

Statistical learning. Chapter 20, Sections 1 4 1

Statistical learning. Chapter 20, Sections 1 4 1 Statistical learning Chapter 20, Sections 1 4 Chapter 20, Sections 1 4 1 Outline Bayesian learning Maximum a posteriori and maximum likelihood learning Bayes net learning ML parameter learning with complete

More information

Fundamental Theorem of Calculus

Fundamental Theorem of Calculus Chater 6 Fundamental Theorem of Calulus 6. Definition (Nie funtions.) I will say that a real valued funtion f defined on an interval [a, b] is a nie funtion on [a, b], if f is ontinuous on [a, b] and integrable

More information

Maximum Entropy and Exponential Families

Maximum Entropy and Exponential Families Maximum Entropy and Exponential Families April 9, 209 Abstrat The goal of this note is to derive the exponential form of probability distribution from more basi onsiderations, in partiular Entropy. It

More information

Convolutional Codes. Lecture 13. Figure 93: Encoder for rate 1/2 constraint length 3 convolutional code.

Convolutional Codes. Lecture 13. Figure 93: Encoder for rate 1/2 constraint length 3 convolutional code. Convolutional Codes Goals Lecture Be able to encode using a convolutional code Be able to decode a convolutional code received over a binary symmetric channel or an additive white Gaussian channel Convolutional

More information

10.5 Unsupervised Bayesian Learning

10.5 Unsupervised Bayesian Learning The Bayes Classifier Maximum-likelihood methods: Li Yu Hongda Mao Joan Wang parameter vetor is a fixed but unknown value Bayes methods: parameter vetor is a random variable with known prior distribution

More information

Probabilistic Graphical Models

Probabilistic Graphical Models Probabilisti Graphial Models David Sontag New York University Leture 12, April 19, 2012 Aknowledgement: Partially based on slides by Eri Xing at CMU and Andrew MCallum at UMass Amherst David Sontag (NYU)

More information

Lecture 3. Linear Regression II Bastian Leibe RWTH Aachen

Lecture 3. Linear Regression II Bastian Leibe RWTH Aachen Advanced Machine Learning Lecture 3 Linear Regression II 02.11.2015 Bastian Leibe RWTH Aachen http://www.vision.rwth-aachen.de/ leibe@vision.rwth-aachen.de This Lecture: Advanced Machine Learning Regression

More information

Machine Learning Applied to Alliance Networks

Machine Learning Applied to Alliance Networks Mahine Learning Alied to Alliane Networks Telmo Menezes (telmo@telmomenezes.om) CAMS-EHESS / CNRS Klaus Hamberger LAS / EHESS / CNRS Camille Roth CAMS-EHESS / ISC-PIF / CNRS The Problem How to disover

More information

Bayesian rules of probability as principles of logic [Cox] Notation: pr(x I) is the probability (or pdf) of x being true given information I

Bayesian rules of probability as principles of logic [Cox] Notation: pr(x I) is the probability (or pdf) of x being true given information I Bayesian rules of probability as principles of logic [Cox] Notation: pr(x I) is the probability (or pdf) of x being true given information I 1 Sum rule: If set {x i } is exhaustive and exclusive, pr(x

More information

Fast, Approximately Optimal Solutions for Single and Dynamic MRFs

Fast, Approximately Optimal Solutions for Single and Dynamic MRFs Fast, Aroximately Otimal Solutions for Single and Dynami MRFs Nikos Komodakis, Georgios Tziritas University of Crete, Comuter Siene Deartment {komod,tziritas}@sd.uo.gr Nikos Paragios MAS, Eole Centrale

More information

Latent Variable Models

Latent Variable Models Latent Variable Models Stefano Ermon, Aditya Grover Stanford University Lecture 5 Stefano Ermon, Aditya Grover (AI Lab) Deep Generative Models Lecture 5 1 / 31 Recap of last lecture 1 Autoregressive models:

More information

CSC 2541: Bayesian Methods for Machine Learning

CSC 2541: Bayesian Methods for Machine Learning CSC 2541: Bayesian Methods for Machine Learning Radford M. Neal, University of Toronto, 2011 Lecture 10 Alternatives to Monte Carlo Computation Since about 1990, Markov chain Monte Carlo has been the dominant

More information

x 2 a mod m. has a solution. Theorem 13.2 (Euler s Criterion). Let p be an odd prime. The congruence x 2 1 mod p,

x 2 a mod m. has a solution. Theorem 13.2 (Euler s Criterion). Let p be an odd prime. The congruence x 2 1 mod p, 13. Quadratic Residues We now turn to the question of when a quadratic equation has a solution modulo m. The general quadratic equation looks like ax + bx + c 0 mod m. Assuming that m is odd or that b

More information

ECE295, Data Assimila0on and Inverse Problems, Spring 2015

ECE295, Data Assimila0on and Inverse Problems, Spring 2015 ECE295, Data Assimila0on and Inverse Problems, Spring 2015 1 April, Intro; Linear discrete Inverse problems (Aster Ch 1 and 2) Slides 8 April, SVD (Aster ch 2 and 3) Slides 15 April, RegularizaFon (ch

More information

USING GENETIC ALGORITHMS FOR OPTIMIZATION OF TURNING MACHINING PROCESS

USING GENETIC ALGORITHMS FOR OPTIMIZATION OF TURNING MACHINING PROCESS Journal of Engineering Studies and Researh Volume 19 (2013) No. 1 47 USING GENETIC ALGORITHMS FOR OPTIMIZATION OF TURNING MACHINING PROCESS DUSAN PETKOVIC 1, MIROSLAV RADOVANOVIC 1 1 University of Nis,

More information

EXTENDED MATRIX CUBE THEOREMS WITH APPLICATIONS TO -THEORY IN CONTROL

EXTENDED MATRIX CUBE THEOREMS WITH APPLICATIONS TO -THEORY IN CONTROL MATHEMATICS OF OPERATIONS RESEARCH Vol. 28, No. 3, August 2003,. 497 523 Printed in U.S.A. EXTENDED MATRIX CUBE THEOREMS WITH APPLICATIONS TO -THEORY IN CONTROL AHARON BEN-TAL, ARKADI NEMIROVSKI, and CORNELIS

More information

Approximating min-max k-clustering

Approximating min-max k-clustering Aroximating min-max k-clustering Asaf Levin July 24, 2007 Abstract We consider the roblems of set artitioning into k clusters with minimum total cost and minimum of the maximum cost of a cluster. The cost

More information

Planar Undulator Considerations

Planar Undulator Considerations LCC-0085 July 2002 Linear Collider Collaboration Teh Notes Planar Undulator Considerations John C. Sheard Stanford Linear Aelerator Center Stanford University Menlo Park, California Abstrat: This note

More information

Development of Stochastic Artificial Neural Networks for Hydrological Prediction

Development of Stochastic Artificial Neural Networks for Hydrological Prediction Development of Stochastic Artificial Neural Networks for Hydrological Prediction G. B. Kingston, M. F. Lambert and H. R. Maier Centre for Applied Modelling in Water Engineering, School of Civil and Environmental

More information

arxiv: v1 [physics.data-an] 26 Oct 2012

arxiv: v1 [physics.data-an] 26 Oct 2012 Constraints on Yield Parameters in Extended Maximum Likelihood Fits Till Moritz Karbach a, Maximilian Schlu b a TU Dortmund, Germany, moritz.karbach@cern.ch b TU Dortmund, Germany, maximilian.schlu@cern.ch

More information

ECE 534 Information Theory - Midterm 2

ECE 534 Information Theory - Midterm 2 ECE 534 Information Theory - Midterm Nov.4, 009. 3:30-4:45 in LH03. You will be given the full class time: 75 minutes. Use it wisely! Many of the roblems have short answers; try to find shortcuts. You

More information

2. Probabilistic Ontology Model (POM)

2. Probabilistic Ontology Model (POM) Sring 2010 LTI Reort Probabilti Ontology Model Ni Lao 1. Introdution Th a ritique or Tom M. Mithell s talk about the Read the Web rojet. In the talk, Tom desribed the researh to develo a never-ending language

More information

Lecture 18: Learning probabilistic models

Lecture 18: Learning probabilistic models Lecture 8: Learning probabilistic models Roger Grosse Overview In the first half of the course, we introduced backpropagation, a technique we used to train neural nets to minimize a variety of cost functions.

More information