Machine Learning 4771

Similar documents
State-Space Models. Initialization, Estimation and Smoothing of the Kalman Filter

t is a basis for the solution space to this system, then the matrix having these solutions as columns, t x 1 t, x 2 t,... x n t x 2 t...

Hidden Markov Models

Hidden Markov Models. Adapted from. Dr Catherine Sweeney-Reed s slides

PENALIZED LEAST SQUARES AND PENALIZED LIKELIHOOD

Stationary Distribution. Design and Analysis of Algorithms Andrei Bulatov

Spring Ammar Abu-Hudrouss Islamic University Gaza

Expectation- Maximization & Baum-Welch. Slides: Roded Sharan, Jan 15; revised by Ron Shamir, Nov 15

Viterbi Algorithm: Background

Object tracking: Using HMMs to estimate the geographical location of fish

Estimation of Poses with Particle Filters

1 Review of Zero-Sum Games

Ensamble methods: Boosting

3.6 Derivatives as Rates of Change

Ensamble methods: Bagging and Boosting

Math 334 Fall 2011 Homework 11 Solutions

Orientation. Connections between network coding and stochastic network theory. Outline. Bruce Hajek. Multicast with lost packets

Diebold, Chapter 7. Francis X. Diebold, Elements of Forecasting, 4th Edition (Mason, Ohio: Cengage Learning, 2006). Chapter 7. Characterizing Cycles

Tom Heskes and Onno Zoeter. Presented by Mark Buller

Announcements. Recap: Filtering. Recap: Reasoning Over Time. Example: State Representations for Robot Localization. Particle Filtering

Deep Learning: Theory, Techniques & Applications - Recurrent Neural Networks -

Speech and Language Processing

Math 10B: Mock Mid II. April 13, 2016

Two Popular Bayesian Estimators: Particle and Kalman Filters. McGill COMP 765 Sept 14 th, 2017

Random Walk with Anti-Correlated Steps

LabQuest 24. Capacitors

Homework sheet Exercises done during the lecture of March 12, 2014

Notes on Kalman Filtering

dt = C exp (3 ln t 4 ). t 4 W = C exp ( ln(4 t) 3) = C(4 t) 3.

Simulation-Solving Dynamic Models ABE 5646 Week 2, Spring 2010

Section 4.4 Logarithmic Properties

An random variable is a quantity that assumes different values with certain probabilities.

Application of a Stochastic-Fuzzy Approach to Modeling Optimal Discrete Time Dynamical Systems by Using Large Scale Data Processing

Linear Gaussian State Space Models

Simulating models with heterogeneous agents

( ) a system of differential equations with continuous parametrization ( T = R + These look like, respectively:

Introduction to Probability and Statistics Slides 4 Chapter 4

GMM - Generalized Method of Moments

Modeling Economic Time Series with Stochastic Linear Difference Equations

Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation

T L. t=1. Proof of Lemma 1. Using the marginal cost accounting in Equation(4) and standard arguments. t )+Π RB. t )+K 1(Q RB

Math Week 14 April 16-20: sections first order systems of linear differential equations; 7.4 mass-spring systems.

CSE 3802 / ECE Numerical Methods in Scientific Computation. Jinbo Bi. Department of Computer Science & Engineering

= ( ) ) or a system of differential equations with continuous parametrization (T = R

Biol. 356 Lab 8. Mortality, Recruitment, and Migration Rates

Vehicle Arrival Models : Headway

d 1 = c 1 b 2 - b 1 c 2 d 2 = c 1 b 3 - b 1 c 3

Section 4.4 Logarithmic Properties

Written HW 9 Sol. CS 188 Fall Introduction to Artificial Intelligence

0.1 MAXIMUM LIKELIHOOD ESTIMATION EXPLAINED

Stability. Coefficients may change over time. Evolution of the economy Policy changes

Linear Response Theory: The connection between QFT and experiments

10. State Space Methods

Online Convex Optimization Example And Follow-The-Leader

Financial Econometrics Jeffrey R. Russell Midterm Winter 2009 SOLUTIONS

THE 2-BODY PROBLEM. FIGURE 1. A pair of ellipses sharing a common focus. (c,b) c+a ROBERT J. VANDERBEI

5. Stochastic processes (1)

Approximation Algorithms for Unique Games via Orthogonal Separators

Some Basic Information about M-S-D Systems

EE363 homework 1 solutions

KEY. Math 334 Midterm I Fall 2008 sections 001 and 003 Instructor: Scott Glasgow

Operating Systems Exercise 3

Final Spring 2007

More Digital Logic. t p output. Low-to-high and high-to-low transitions could have different t p. V in (t)

Bias in Conditional and Unconditional Fixed Effects Logit Estimation: a Correction * Tom Coupé

Chapter 2. First Order Scalar Equations

Laplace transfom: t-translation rule , Haynes Miller and Jeremy Orloff

HW6: MRI Imaging Pulse Sequences (7 Problems for 100 pts)

Problem set 2 for the course on. Markov chains and mixing times

Lecture 9: September 25

The average rate of change between two points on a function is d t

Georey E. Hinton. University oftoronto. Technical Report CRG-TR February 22, Abstract

Matrix Versions of Some Refinements of the Arithmetic-Geometric Mean Inequality

- If one knows that a magnetic field has a symmetry, one may calculate the magnitude of B by use of Ampere s law: The integral of scalar product

LECTURE 1: GENERALIZED RAY KNIGHT THEOREM FOR FINITE MARKOV CHAINS

Announcements: Warm-up Exercise:

EECE 301 Signals & Systems Prof. Mark Fowler

SOLUTIONS TO ECE 3084

ψ ( t) = c n ( t) t n ( )ψ( ) t ku t,t 0 ψ I V kn

Reading from Young & Freedman: For this topic, read sections 25.4 & 25.5, the introduction to chapter 26 and sections 26.1 to 26.2 & 26.4.

Chapter 8 The Complete Response of RL and RC Circuits

Chapter 21. Reinforcement Learning. The Reinforcement Learning Agent

Chapter 7: Solving Trig Equations

CSE/NB 528 Lecture 14: From Supervised to Reinforcement Learning (Chapter 9) R. Rao, 528: Lecture 14

Math From Scratch Lesson 34: Isolating Variables

Modal identification of structures from roving input data by means of maximum likelihood estimation of the state space model

Zürich. ETH Master Course: L Autonomous Mobile Robots Localization II

CSE/NB 528 Lecture 14: Reinforcement Learning (Chapter 9)

Lecture Notes 3: Quantitative Analysis in DSGE Models: New Keynesian Model

Algorithmic Discrete Mathematics 6. Exercise Sheet

Lecture Outline. Introduction Transmission Line Equations Transmission Line Wave Equations 8/10/2018. EE 4347 Applied Electromagnetics.

MOMENTUM CONSERVATION LAW

MATH 128A, SUMMER 2009, FINAL EXAM SOLUTION

Answers to QUIZ

Problem Set 9 Due December, 7

Designing Information Devices and Systems I Spring 2019 Lecture Notes Note 17

Section 3.5 Nonhomogeneous Equations; Method of Undetermined Coefficients

Testing for a Single Factor Model in the Multivariate State Space Framework

EECS 141: FALL 00 MIDTERM 2

Appendix to Online l 1 -Dictionary Learning with Application to Novel Document Detection

Transcription:

ony Jebara, Columbia Universiy achine Learning 4771 Insrucor: ony Jebara

ony Jebara, Columbia Universiy opic 20 Hs wih Evidence H Collec H Evaluae H Disribue H Decode H Parameer Learning via JA & E

ony Jebara, Columbia Universiy Hs: JA wih Evidence If y seuence is observed (in problems 1,2,3 ge evidence: p( 0 p( 1 p,y 1 he poenials urn ino slices: ψ (, 0 1 ψ 0,y 0 φ ( 1 ς( 1 φ 0 ψ 1,y 1 Nex, pick a roo, for example righmos one: Collec all zea separaors boom up: ς * ( ψ,y p( y Collec lefmos phi separaor o he righ: x p( y ψ 0,y 0 φ * 0 ς * ( ψ,y x ψ 1,y 1 y δ( y 0 y 0 p y 0, 0 y 0 ς * ( ψ(,y p( y ψ( 1,

ony Jebara, Columbia Universiy Hs: Collec wih Evidence Now, we will collec (* along he backbone lef o righ Updae each cliue wih is lef and boom separaors: ( φ* ς * ( +1 1 1 ψ * ( ψ * φ * +1 ψ( φ * ( p( y +1 +1 α φ * ( p( y +1 +1 α Keep going along chain unil righ mos node Noe: above formula for phi is recursive, could use as is. Propery: recall we had p y 0, 0 0 p y 0,y 1, 1 1 p y 0,,y, φ * 1 φ * 2 φ * +1 p( y 0, 0 φ * 0 p( y 1 1 p( 1 0 p( y 0,y 1, 1 p( y 2 2 p( 2 1 p( y 0,y 1,y 2, 2 p( y +1 +1 p( +1 p y 0,,y +1

ony Jebara, Columbia Universiy Hs: Evaluae wih Evidence Say we are solving he firs H problem: 1 Evaluae: given y 0,,y & θ compue p(y 0,,y θ If we wan o compue he likelihood, we are already done! We really jus need o do collec (no even disribue. From previous slide we had: p y 0,,y, φ * +1 p( y +1 +1 p( +1 p y 0,,y +1 Collec il roo (righmos node: is normalizer is p(evidence! p( y 0,,y, ( 1, p( y 0,,y, 1, Or use hypoheical φ * Can compue he likelihood jus by marginalizing his phi p( y 0,,y, p y 0,,y φ * So, adding up he enries in las φ* gives us he likelihood ψ * hypoheical

ony Jebara, Columbia Universiy Hs: Disribue wih Evidence Back o collecing say jus finished collecing o he roo wih our las updae formula: ψ * ( φ* 1 1, 1 ς * 1 ψ( 1, φ * ( 1 p( y α 1, Now, we disribue (** along he backbone righ o lef Have firs ** for roo (says he same: ψ ** 1, Sar going o he lef from here: ψ * ( 1, a b c d for -1 o 0 a b φ ** ( ψ ** +1 ψ ** ( ς ** +1 c d φ** +1 ψ ** ψ ** y, ψ* φ * +1 ς** ψ y, ς *

ony Jebara, Columbia Universiy H Example You are given he parameers of a 2-sae H. You observed he inpu seuence AB (from a 2-symbol alphabe A or B. In oher words, you observe wo symbols from your finie sae machine, A and hen B. Using he juncion ree algorihm, evaluae he likelihood of his daa p(y given your H and is parameers. Also compue (for decoding he individual marginals of he saes afer he evidence from his seuence is observed: p( 0 y and p( 1 y. he parameers for he H are provided below. hey are he iniial sae prior p( 0, he sae ransiion marix given by p( -1, and he emission marix p(y, respecively. π p( 0 1 2 1/ 3 2 / 3 1 2 1 2 a p( 1 1 2 3 / 4 1 / 2 1/ 4 1 / 2 η p( y A B 1 / 2 1 / 3 1 / 2 2 / 3

H Example ony Jebara, Columbia Universiy

H Example ony Jebara, Columbia Universiy

ony Jebara, Columbia Universiy Hs: arginals & axdecoding Now ha JA is finished, we have he following: φ ** ( p( y 1,,y ς ** ( +1 p( +1 y 1,,y p( y 1,,y ψ ** he separaors define a disribuion over he hidden saes his gives he probabiliy he DNA symbol y was {I,E,P} We ve done 2 Decode: given y 0,,y & θ find p( 0,,p( Can also do 2 Decode: given y 0,,y & θ find 0,, We can also decode o find he mos likely pah 0 Here, we use he Argax JA algorihm Run JA bu replace sums wih max hen, find bigges enry in separaors: ˆ arg max φ ** 0 I I E E P G A C C

ony Jebara, Columbia Universiy Hs: E Learning Finally 3 ax Likelihood: given y 0,,y learn parameers θ Recall max likelihood: ˆθ arg maxθ log p( y θ If observe, i s easy o maximize he complee likelihood: log p(,y log p( 0 p( 1 1 log p( 0 + log p( 1 l θ ( p y + log p y 1 i log π 0 N i1 i + log α 1 i1 ij + log η i1 ij i i 0 log π i + 1 j N logα ij + i y j log η ij i1 Inroduce Lagrange & ake derivaives ˆπ i 0 i ˆα ij 1 i, i j 1 i1 N π i 1 α ij 1 η ij 1 i1 1 i j +1 1 k1 i k +1 ˆη ij i y j N k1 i y k i j y

ony Jebara, Columbia Universiy Hs: E Learning Bu, we don observe he s, incomplee p(,y θ p y θ E: ax expeced complee likelihood given curren p( E l θ -sep is maximizing as before: Wha are E{} s? 0! p 0 0 p( 1 { } E { p 0 log p,y } p y (,, y { + i i1 1 j N 1 logα i, ij + i1 i y j log η } ij 1 log p,y p y E i 0 log π i i i j N i E { 0 }log π i + E { 1 }logα ij + E { }y j log η ij i1 i ˆπ i E { 0 } ˆα ij 1 i, 1 1 k1 { } E i k { +1 } E i j +1 ˆη ij i1 N k1 i j E { }y i k E { }y

ony Jebara, Columbia Universiy Hs: E Learning Bu, we don observe he s, incomplee p(,y θ p y θ E: ax expeced complee likelihood given curren p( E l θ -sep is maximizing as before: Wha are E{} s? 0! p 0 0 p( 1 { } E { p 0 log p,y } p y (,, y { + i i1 1 j N 1 logα i, ij + i1 i y j log η } ij 1 log p,y p y E i 0 log π i i i j N i E { 0 }log π i + E { 1 }logα ij + E { }y j log η ij i1 i ˆπ i E { 0 } ˆα ij 1 E p(x i, 1 E i j { +1 } 1 E i k { +1 } { x i } p x k1 i1 i j E { }y ˆη ij N i k k1 E { }y x i p( xδ( x x i p( x i x x

ony Jebara, Columbia Universiy E i j +1 Hs: E Learning Bu, we don observe he s, incomplee p(,y θ p y θ E: ax expeced complee likelihood given curren p( E l θ -sep is maximizing as before: Wha are E{} s? 0! p 0 0 p( 1 { } E { p 0 log p,y } p y (,, y { + i i1 1 j N 1 logα i, ij + i1 i y j log η } ij Our JA ψ & φ marginals! (JA is he E-Sep for given θ 1 log p,y p y E i 0 log π i i i j N i E { 0 }log π i + E { 1 }logα ij + E { }y j log η ij i1 i ˆπ i E { 0 } ˆα ij 1 E p(x i, 1 E i j { +1 } 1 E i k { +1 } { x i } p x k1 { } p( i j y ψ** i j i1 i j E { }y ˆη ij N i k k1 E { }y x i p( xδ( x x i p( x i x ψ ** ( i j E i ij x { } p( i y φ** i i φ ** i

ony Jebara, Columbia Universiy hank you! So, o incomplee maximize likelihood wih E, iniialize parameers randomly, Run Juncion ree Algorihm o ge marginals Use marginals over s in he maximum likelihood sep Please complee course evaluaion on courseworks Good luck wih finals week and happy holidays!