Factor Analysis with Poisson Output

Similar documents
Clustering Methods without Given Number of Clusters

into a discrete time function. Recall that the table of Laplace/z-transforms is constructed by (i) selecting to get

The continuous time random walk (CTRW) was introduced by Montroll and Weiss 1.

DYNAMIC MODELS FOR CONTROLLER DESIGN

Jul 4, 2005 turbo_code_primer Revision 0.0. Turbo Code Primer

Lecture 7: Testing Distributions

Kalman Filter. Wim van Drongelen, Introduction

Chapter 2 Sampling and Quantization. In order to investigate sampling and quantization, the difference between analog

By Xiaoquan Wen and Matthew Stephens University of Michigan and University of Chicago

Stochastic Optimization with Inequality Constraints Using Simultaneous Perturbations and Penalty Functions

Learning Multiplicative Interactions

6. KALMAN-BUCY FILTER

Bogoliubov Transformation in Classical Mechanics

CHAPTER 8 OBSERVER BASED REDUCED ORDER CONTROLLER DESIGN FOR LARGE SCALE LINEAR DISCRETE-TIME CONTROL SYSTEMS

A FUNCTIONAL BAYESIAN METHOD FOR THE SOLUTION OF INVERSE PROBLEMS WITH SPATIO-TEMPORAL PARAMETERS AUTHORS: CORRESPONDENCE: ABSTRACT

Multi-dimensional Fuzzy Euler Approximation

Correction for Simple System Example and Notes on Laplace Transforms / Deviation Variables ECHE 550 Fall 2002

FILTERING OF NONLINEAR STOCHASTIC FEEDBACK SYSTEMS

Efficient Methods of Doppler Processing for Coexisting Land and Weather Clutter

Joint Factor Analysis of Speaker and Session Variability: Theory and Algorithms

Nonlinear Single-Particle Dynamics in High Energy Accelerators

Lecture 17: Analytic Functions and Integrals (See Chapter 14 in Boas)

A Constraint Propagation Algorithm for Determining the Stability Margin. The paper addresses the stability margin assessment for linear systems

Source slideplayer.com/fundamentals of Analytical Chemistry, F.J. Holler, S.R.Crouch. Chapter 6: Random Errors in Chemical Analysis

CHAPTER 4 DESIGN OF STATE FEEDBACK CONTROLLERS AND STATE OBSERVERS USING REDUCED ORDER MODEL

Minimal state space realization of MIMO systems in the max algebra

Estimation of Peaked Densities Over the Interval [0,1] Using Two-Sided Power Distribution: Application to Lottery Experiments

On the chromatic number of a random 5-regular graph

SERIES COMPENSATION: VOLTAGE COMPENSATION USING DVR (Lectures 41-48)

A PROOF OF TWO CONJECTURES RELATED TO THE ERDÖS-DEBRUNNER INEQUALITY

in a circular cylindrical cavity K. Kakazu Department of Physics, University of the Ryukyus, Okinawa , Japan Y. S. Kim

Reliability Analysis of Embedded System with Different Modes of Failure Emphasizing Reboot Delay

ON THE APPROXIMATION ERROR IN HIGH DIMENSIONAL MODEL REPRESENTATION. Xiaoqun Wang

IEOR 3106: Fall 2013, Professor Whitt Topics for Discussion: Tuesday, November 19 Alternating Renewal Processes and The Renewal Equation

SMALL-SIGNAL STABILITY ASSESSMENT OF THE EUROPEAN POWER SYSTEM BASED ON ADVANCED NEURAL NETWORK METHOD

Chapter 4 Interconnection of LTI Systems

Technical Appendix: Auxiliary Results and Proofs

One Class of Splitting Iterative Schemes

Comparing Means: t-tests for Two Independent Samples

Beta Burr XII OR Five Parameter Beta Lomax Distribution: Remarks and Characterizations

White Rose Research Online URL for this paper: Version: Accepted Version

Chapter 2: Problem Solutions

RELIABILITY OF REPAIRABLE k out of n: F SYSTEM HAVING DISCRETE REPAIR AND FAILURE TIMES DISTRIBUTIONS

Chapter 4: Applications of Fourier Representations. Chih-Wei Liu

Analytical estimates of limited sampling biases in different information measures

Jan Purczyński, Kamila Bednarz-Okrzyńska Estimation of the shape parameter of GED distribution for a small sample size

Advanced Digital Signal Processing. Stationary/nonstationary signals. Time-Frequency Analysis... Some nonstationary signals. Time-Frequency Analysis

A SIMPLE NASH-MOSER IMPLICIT FUNCTION THEOREM IN WEIGHTED BANACH SPACES. Sanghyun Cho

Convex Hulls of Curves Sam Burton

7.2 INVERSE TRANSFORMS AND TRANSFORMS OF DERIVATIVES 281

Thermal Resistance Measurements and Thermal Transient Analysis of Power Chip Slug-Up and Slug-Down Mounted on HDI Substrate

1. The F-test for Equality of Two Variances

TRIPLE SOLUTIONS FOR THE ONE-DIMENSIONAL

Solutions to homework #10

Introduction to Laplace Transform Techniques in Circuit Analysis

Notes on Phase Space Fall 2007, Physics 233B, Hitoshi Murayama

Codes Correcting Two Deletions

The Hassenpflug Matrix Tensor Notation

Bayesian-Based Decision Making for Object Search and Characterization

A Likelihood Ratio Formula for Two- Dimensional Random Fields

Design By Emulation (Indirect Method)

11.5 MAP Estimator MAP avoids this Computational Problem!

Semiflexible chains under tension

Chapter 5 Optimum Receivers for the Additive White Gaussian Noise Channel

PHYS 110B - HW #6 Spring 2004, Solutions by David Pace Any referenced equations are from Griffiths Problem statements are paraphrased

Pairwise Markov Random Fields and its Application in Textured Images Segmentation

Laplace Transformation

PDF hosted at the Radboud Repository of the Radboud University Nijmegen

An Inequality for Nonnegative Matrices and the Inverse Eigenvalue Problem

Solving Differential Equations by the Laplace Transform and by Numerical Methods

Avoiding Forbidden Submatrices by Row Deletions

Copyright 1967, by the author(s). All rights reserved.

Asymptotics of ABC. Paul Fearnhead 1, Correspondence: Abstract

ANALYSIS OF DECISION BOUNDARIES IN LINEARLY COMBINED NEURAL CLASSIFIERS

Random vs. Deterministic Deployment of Sensors in the Presence of Failures and Placement Errors

Microblog Hot Spot Mining Based on PAM Probabilistic Topic Model

What lies between Δx E, which represents the steam valve, and ΔP M, which is the mechanical power into the synchronous machine?

Random Sparse Linear Systems Observed Via Arbitrary Channels: A Decoupling Principle

Efficient Neural Codes that Minimize L p Reconstruction Error

Characterizing the Effect of Channel Estimation Error on Distributed Reception with Hard Decision Exchanges

Inference for Two Stage Cluster Sampling: Equal SSU per PSU. Projections of SSU Random Variables on Each SSU selection.

To appear in International Journal of Numerical Methods in Fluids in Stability analysis of numerical interface conditions in uid-structure therm

Solving Radical Equations

Convergence criteria and optimization techniques for beam moments

Euler-Bernoulli Beams

UNIT 15 RELIABILITY EVALUATION OF k-out-of-n AND STANDBY SYSTEMS

SIMPLE LINEAR REGRESSION

arxiv: v3 [quant-ph] 23 Nov 2011

Centre for Efficiency and Productivity Analysis

Calculation of the temperature of boundary layer beside wall with time-dependent heat transfer coefficient

The type 3 nonuniform FFT and its applications

[Saxena, 2(9): September, 2013] ISSN: Impact Factor: INTERNATIONAL JOURNAL OF ENGINEERING SCIENCES & RESEARCH TECHNOLOGY

Secretary problems with competing employers

THE SPLITTING SUBSPACE CONJECTURE

Unbounded solutions of second order discrete BVPs on infinite intervals

Online supplementary information

An estimation approach for autotuning of event-based PI control systems

Proactive Serving Decreases User Delay Exponentially: The Light-tailed Service Time Case

Rao Transforms: A New Approach to Integral and Differential Equations

V = 4 3 πr3. d dt V = d ( 4 dv dt. = 4 3 π d dt r3 dv π 3r2 dv. dt = 4πr 2 dr

Transcription:

Factor Analyi with Poion Output Gopal Santhanam Byron Yu Krihna V. Shenoy, Department of Electrical Engineering, Neurocience Program Stanford Univerity Stanford, CA 94305, USA {gopal,byronyu,henoy}@tanford.edu Maneeh Sahani Gatby Computational Neurocience Unit Univerity College London 7 Queen Square, London WCN 3AR, UK maneeh@gatby.ucl.ac.uk Technical Report NPSL-TR-06- March 9, 006 Abtract We derive a modified verion of factor analyi for data that i poion rather than gauian ditributed. Thi modified approach may better fit certain clae of data, including neuronal piking data commonly collected in electrophyiology experiment. Introduction Factor analyi and other imilar dimenionality reduction approache e.g., PCA or SPCA are derived uing a tate-pace model. The latent tate i modeled a a gauian ditribution. The oberved output i modeled a a linear function of the latent tate with additive gauian noie. Thi approach can provide the benefit of reducing the dimenionality of the oberved, but noiy, data to a mall number of underlying factor. Thee factor may then be ued to provide meaningful prediction on new data. For count, or point proce, data, the gauian output noie model ued in factor analyi may not provide a good decription of the data. Intead, we modify the output noie model to be poion. Additionally, we extend the tate-pace model to incorporate a mixture of gauian rather than a ingle gauian ditribution. Thi extenion can erve to better model the latent tate, epecially when there i an a priori expectation that data i clutered. Once trained, the model can be ued to make prediction of the latent or unoberved tate for new oberved data. We dub our new approach a Factor Analyi with Poion Output, or FAPO for hort. Generative Model The generative model for FAPO i given below. x N µ,σ y i x Poionhc i x + d i for i,..., q

The random variable i the mixture component indicator and ha a dicrete probability ditribution over {,..., M} i.e., P π. Given, the latent tate vector, x R p, i gauian ditributed with mean µ and covariance Σ. The output, y i N 0, are generated from a poion ditribution where h i a link function mapping R R +, c i R p and d i R are contant, and R i the time bin width. We collect the count from all q imultaneouly oberved variable into a vector y N q 0, whoe ith element i y i. The choice of the link function h i dicued in the following ection. In thi work, we aume that all of the parameter of the model, namely π, µ, Σ, c i, and d i for {,..., M} and i {,..., q}, are unknown. The goal i to learn the parameter o that the model can be ued to make prediction of x and for a new y. 3 Sytem Identification The procedure of ytem identification, or model training, require learning the parameter from the oberved data. The oberved data include N obervation of y, an i.i.d. equence y,y,...,y N denoted by {y}, and N obervation of the mixture component indicator,, an i.i.d. equence,,..., N denoted by {}. The latent tate vector are hidden and not oberved. Thi ituation i an unupervied problem, although not completely unupervied; the ytem identification can be more challenging if i alo unknown. Thi latter cenario i beyond the cope of thi article. Once the model i trained, however, we etimate the mot likely x and with new oberved data y, a decribed in the following ection. The tandard approach to ytem identification in the preence of unoberved latent variable i the Expectation-Maximiation or EM algorithm. The algorithm maximie the likelihood of the model parameter i.e., θ {π,µ,...,m,σ,...,m,c,...,q, d,...,q } over the oberved data. The algorithm i iterative and each iteration i performed in two part, the expectation E tep and the maximiation M tep. Iteration are performed until the likelihood converge. 3. E-tep The E-tep of EM require computing the expected log joint likelihood, E log P {x},{y},{} θ, over the poterior ditribution of the hidden tate vector, P {x} {y},,θ k, where θ k are the parameter etimate at the kth EM iteration. Since the obervation are i.i.d. we can equivalently maximie the um of the individual expected log joint likelihood, E log P x,y, θ. The poterior ditribution can be expreed a follow: P x y,,θ k P y x,θ k P x,θ k. 3 Becaue P y x i a product of poion rather than a gauian, the tate poterior P x y will not be of a form that allow for eay computation of the log joint likelihood. Intead, a in, we approximated thi poterior with a gauian centered at the mode of log P x y and whoe covariance i given by the negative invere heian of the log poterior at that mode. Certain choice of h, including h e and h log + e, lead to a log poterior that i trictly concave in x. In thee cae, the unique mode can eaily be found by Newton method. log P x y,,θ log P y x,θ + log P x,θ + C q log P y i x + log N x µ,σ + C q h q h c i x + d i + y i log h c i x + d i log y! i + log N x µ,σ + C3 c i x + d i + y i log h c i x + d i x Σ x + µ Σ x + C 4 4

Taking the gradient and heian of 4 with repect to x, reult in the following expreion. q x log P x y,,θ x h c i x + d i + y i x log h c i x + d i Σ x + Σ x log P x y,,θ q x h c i x + d i + y i x log h c i x + d i Σ Letting, ζ i c i x + d i. For the aforementioned verion of h and h, the gradient and heian are q x log P x y,,θ e ζi c i + y i c i Σ x + Σ µ 5 q e ζi + y i c i µ Σ x + Σ µ 6 q x log P x y,,θ e ζi c i c i Σ 7 and q x log P x y,,θ eζi + e ci + y i eζ i ζi + e ζ i log + e ci Σ ζi x + Σ q + y i log + e ζi e ζi ci ζi + e µ Σ x + Σ µ 8 q e x log P x y,,θ ζi c i y i +e ζi e ζi log + e ζ i + e ci x ζi q + + y i log + e ζi e ζ i + e ζi c i e ζi c i + e ζ i y i e ζi log + e ζ i + e ζ i ci c i + + y i log + e ζi e ζi + e ζ i ci c i Σ c i Σ q y i e ζi log + e ζ i + yi log + e e ζi ζi + e ζ i ci c i Σ q + y i log + e e ζi ζi log + e e ζi ζi + e ζ i ci c i Σ, 9 repectively. For obervation n, let Q n be a gauian ditribution in R p that approximate P x n y n,,θ k and ha mean ξ n and covariance Ψ n. The expectation of the log joint likehood for a given obervation can be expreed 3

a follow: E n E Qn log P xn,y n, θ 0 E Qn q E Qn q log P yn n i x + log P x n + log P h c i x n + d i + yn i log h c i x n + d i log yn i! p logπ log Σ n + log P. x n Σ x n + µ Σ x n µ Σ µ n The term that do not depend on x n or any component of θ can be grouped a a contant, C, outide the expectation. Doing o, and alo moving term that do not depend on x n outide the expectation, we have q E n E Qn h c i x n + d i + yn i log h c i x n + d i x n Σ x n + µ Σ x n µ Σ µ n log Σ n + C E Qn q h c i x n + d i + yn i log h c i x n + d i E Q n x n Σ x n + µ n Σ E Qn x n 3 µ Σ µ n log Σ n + C q E Qn h c i x n + d i + yn i log h c i x n + d i Tr Σ Ψn n + ξ n ξ n + µ n Σ ξ n 4 µ Σ µ n log Σ n + C, where 3 i implified to 4 by uing the following relationhip: E Qn x n Σ x n EQn Tr x n Σ x n E Qn Tr Σ x n x n Tr Σ E Qn xn x n Tr Σ Ψn n + ξ n ξ n. Becaue the poterior tate ditribution are approximated a gauian in the E-tep, the expectation in 4 i a gauian integral that involveon-linear function g and h and cannot be computed analytically in general. Fortunately, thi high-dimenional integral can be reduced to a one-dimenional gauian integral with mean c i ξ n and variance c i Ψ n c i. The expectation of the log joint likelihood over all of the N obervation i imply the um of the individual 4

E n term: E E Q log P {x},{y},{} θ E n. 3. M-tep The M-tep require finding learning the ˆθk+ that atifie: ˆθ k+ arg max θ E Q log P {x},{y},{} θ. 5 Thi can achieved by differentiating E with repect to the parameter, θ, a hown below. The indicator function, I will prove ueful. Alo, let N N I. Prior probability of mixture component identification : π N I 6 State vector mean, for mixture component identification : E µ I Σ ξ n Σ µ 0 µ k+ N State vector covariance, for mixture component identification : N Σ E I Σ I Σ Σ Tr Σ I Σ I ξ n 7 Ψn + ξ n ξ n + µ Σ Ψn + ξ n ξ n µ ξ n + µ µ Ψn + ξ n ξ n µ ξ n + µ µ Σ ξ n µ Σ µ log Σ Σ Σ 0 Σ k+ N N Obervation mapping contant: I Ψ n + ξ n ξ n µ k+ N I ξ n + N µ k+ µ k+ I I Ψ n + ξ n ξ n µ k+ µ k+ 8 5

We want to maximie the following objective function, with repect to c i and d i : q Ẽ E Qn h c i x n + d i + yn i log h c i x n + d i q q E Qn h c i x n + d i + yn i log h c i x n + d i E Qn h c i x n + d i + yn i log h c i x n + d i + C. 9 Firt, let u intead examine the following more general problem: maximie the objective function E x g c x + d with repect to c and d, where g i concave and x i gauian ditributed with mean ξ and covariance Ψ. Defining the new variable c c d and x x, the objective function can be equivalently expreed a O E x g c x + d g c x N x ξ, Ψ dx x g N c ξ, c Ψ c d, 0 where ξ ξ and Ψ i a matrix in R p+ p+ with the upper-left p p ub-matrix equal to Ψ and the ret of the element et to ero. Thi objective function can be maximied uing Newton method ince it i concave in c. However, to perform thi optimiation method, we require the gradient and the heian of O. The gradient can be obtained a hown below. O g N c ξ, c Ψ c d g N c ξ, c Ψ c d g c ξ π exp d Taking the partial derivative in require the following quantitie. c Ψ c c Ψ c 3 Ψ c Ψ c c ξ c ξ exp c ξ exp c ξ c ξ ξ c Ψ c 4 Ψ c c ξ 4 ξ c ξ Ψ c ξ c Uing the equation above, we can reduce to O g c ξ ξ + c ξ Ψ c Ψ c + c ξ ξ + c ξ Ψ c N c ξ, c Ψ c d. 6

While there doeot exit a convenient analytic olution to the above integral, it can be accurately and reaonably efficiently approximated uing gauian quadrature, 3. Specifically, gauian quadrature rule tate that f N µ,σ d J w j f Z j for Z j µ + γ j σ, 3 j for any function f, where w j are the quadrature weight and γ j are the normalied quadrature point. Identifying the function f in and ubtituting Z j c ξ + γ j, the quadrature function for the gradient i f γ j g c ξ + γ j Ψ c + g c ξ + γ j γ j γ j ξ + γ j ξ + Ψ c γ j Ψ c. 4 Likewie, the ame procedure mut be performed to find the heian of the objective function: O g N c ξ, c Ψ c The following quantitie will be ueful. Ψ c c Ψ c Ψ c c Ψ c Ψ c c Ψ + Ψ Ψ c + c ξ ξ + c ξ Ψ c d. 5 c ξ ξ c c Ψ c Ψ c c ξ ξ c Ψ c Ψ c c ξ ξ ξ ξ c ξ Ψ c c ξ Ψ + c ξ c Ψ c ξ c ξ c ξ c Ψ c Ψ c c ξ ξ 4 c Ψ c Ψ c c ξ 3 c Ψ c ξ Next define a function a a aγ j γ j ξ + γ j Ψ c. 6 Subtituting thee expreion into 5, the quadrature function for the heian i: f γ j g c ξ + γ j aγ j aγ j + c Ψ c Ψ c c Ψ Ψ γ j c Ψ c Ψ cξ 3 ξ ξ + γ j 4γ j Ψ c Ψ c Ψ c c Ψ γ j c Ψ c 3 ξ c Ψ. 7 7

For certain choice of h it i poible to compute the gradient and heian analytically. To illutrate, we tart with the following form gn µ,σ µ d g exp πσ σ d 8 for g exp. The claic method to olve thi integral i to complete the quare in the exponent. µ exp πσ σ d exp µ + µ σ πσ σ d exp πσ exp πσ µ + σ + µ σ d µ + σ + µ + σ µ + σ 4 µ σ µ σ 4 + µ σ exp µ + µ + σ σ exp πσ σ d exp µ + σ N µ + σ,σ d exp µ + σ d 9 Relating thi form back to 0, the gradient with repect to c i exp c ξ + c Ψ c exp c ξ + c Ψ c ξ + Ψ c. 30 Likewie, the heian can be computed a follow. exp c ξ + c Ψ c ξ + Ψ c exp c ξ + c Ψ c ξ + Ψ c ξ + Ψ c Ψ + For another choice of gx, c ξ, the gradient i trivially ξ and the heian i the ero matrix. 3 4 Inference Once the model parameter have been choen, the generative model can be ued to make inference on the training data or new obervation. For the training data, the hidden tate vector x i the only variable that mut be inferred. The poterior ditribution of x can be approximated by a gauian, exactly a decribed previouly. Thi reult in a ditribution Q with mean ξ and covariance Ψ. Therefore, the maximum a poteriori etimate etimate of x i imply ξ. When performing inference for a new obervation, the mixture component identification,, i aumed to be unknown. The poterior ditribution of both and x, given the data, y, are potentially of interet. The firt of thee ditribution can be expreed a follow: P y, ˆθ P y, ˆθ P ˆθ π P y,x, ˆθ dx π x x P y x, ˆθ P x, ˆθ dx π E x P y x, ˆθ. 3 8

where the expectation in 3 i of a product of poion with repect to a gauian ditribution that ha mean ˆµ and covariance ˆΣ. Thi expectation can be computed uing ampling technique or Laplace method. To infer x given the data, the following derivation applie: P M x y, ˆθ P x y,, ˆθ P y, ˆθ M P x y,, ˆθ π E x P y x, ˆθ. 33 9

Reference A.C. Smith and E.N. Brown. Etimating a tate-pace model from point proce obervation. Neural Comput, 55:965 99, 003. S.J. Julier and J.K. Uhlmann. A new extenion of the Kalman filter to nonlinear ytem. In Proc. AeroSene: th Int. Symp. Aeropace/Defene Sening, Simulation and Control, page 8 93, 997. 3 U.N. Lerner. Hybrid Bayeian network for reaoning about complex ytem. PhD thei, Stanford Univerity, Stanford, CA, 00. 0