Using deep belief network modelling to characterize differences in brain morphometry in schizophrenia

Similar documents
CSC321 Tutorial 9: Review of Boltzmann machines and simulated annealing

Supporting Information

EEE 241: Linear Systems

Multilayer Perceptrons and Backpropagation. Perceptrons. Recap: Perceptrons. Informatics 1 CG: Lecture 6. Mirella Lapata

1 Convex Optimization

SDMML HT MSc Problem Sheet 4

Logistic Regression. CAP 5610: Machine Learning Instructor: Guo-Jun QI

Homework Assignment 3 Due in class, Thursday October 15

For now, let us focus on a specific model of neurons. These are simplified from reality but can achieve remarkable results.

Week 5: Neural Networks

Hopfield networks and Boltzmann machines. Geoffrey Hinton et al. Presented by Tambet Matiisen

Multilayer neural networks

Natural Images, Gaussian Mixtures and Dead Leaves Supplementary Material

Discriminative classifier: Logistic Regression. CS534-Machine Learning

VIDEO KEY FRAME DETECTION BASED ON THE RESTRICTED BOLTZMANN MACHINE

Course 395: Machine Learning - Lectures

Deep Learning: A Quick Overview

Gaussian Conditional Random Field Network for Semantic Segmentation - Supplementary Material

Module 3 LOSSY IMAGE COMPRESSION SYSTEMS. Version 2 ECE IIT, Kharagpur

Other NN Models. Reinforcement learning (RL) Probabilistic neural networks

Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data

Parametric fractional imputation for missing data analysis. Jae Kwang Kim Survey Working Group Seminar March 29, 2010

Admin NEURAL NETWORKS. Perceptron learning algorithm. Our Nervous System 10/25/16. Assignment 7. Class 11/22. Schedule for the rest of the semester

Deep Learning. Boyang Albert Li, Jie Jay Tan

Neural Networks. Perceptrons and Backpropagation. Silke Bussen-Heyen. 5th of Novemeber Universität Bremen Fachbereich 3. Neural Networks 1 / 17

Lecture Notes on Linear Regression

Multi-Conditional Learning for Joint Probability Models with Latent Variables

Hidden Markov Models

Gaussian process classification: a message-passing viewpoint

Discriminative classifier: Logistic Regression. CS534-Machine Learning

Maximum Likelihood Estimation of Binary Dependent Variables Models: Probit and Logit. 1. General Formulation of Binary Dependent Variables Models

Multi-layer neural networks

Kernels in Support Vector Machines. Based on lectures of Martin Law, University of Michigan

EM and Structure Learning

INF 5860 Machine learning for image classification. Lecture 3 : Image classification and regression part II Anne Solberg January 31, 2018

Multilayer Perceptron (MLP)

Generalized Linear Methods

MULTISPECTRAL IMAGE CLASSIFICATION USING BACK-PROPAGATION NEURAL NETWORK IN PCA DOMAIN

Predictive Analytics : QM901.1x Prof U Dinesh Kumar, IIMB. All Rights Reserved, Indian Institute of Management Bangalore

Gaussian Mixture Models

Finite Mixture Models and Expectation Maximization. Most slides are from: Dr. Mario Figueiredo, Dr. Anil Jain and Dr. Rong Jin

Multigradient for Neural Networks for Equalizers 1

Second order approximations for probability models

Maximum Likelihood Estimation

Generative classification models

Design and Optimization of Fuzzy Controller for Inverse Pendulum System Using Genetic Algorithm

Fundamentals of Neural Networks

Dr. Shalabh Department of Mathematics and Statistics Indian Institute of Technology Kanpur

Composite Hypotheses testing

AN IMPROVED PARTICLE FILTER ALGORITHM BASED ON NEURAL NETWORK FOR TARGET TRACKING

Neural networks. Nuno Vasconcelos ECE Department, UCSD

MATH 567: Mathematical Techniques in Data Science Lab 8

Kernel Methods and SVMs Extension

Lecture 10 Support Vector Machines II

Maximum Likelihood Estimation of Binary Dependent Variables Models: Probit and Logit. 1. General Formulation of Binary Dependent Variables Models

Statistical analysis using matlab. HY 439 Presented by: George Fortetsanakis

MLE and Bayesian Estimation. Jie Tang Department of Computer Science & Technology Tsinghua University 2012

Evaluation for sets of classes

Neural Networks. Adapted from slides by Tim Finin and Marie desjardins. Some material adapted from lecture notes by Lise Getoor and Ron Parr

Logistic Classifier CISC 5800 Professor Daniel Leeds

Support Vector Machines. Vibhav Gogate The University of Texas at dallas

The Study of Teaching-learning-based Optimization Algorithm

Classification learning II

Chapter 7 Channel Capacity and Coding

On an Extension of Stochastic Approximation EM Algorithm for Incomplete Data Problems. Vahid Tadayon 1

ECE 534: Elements of Information Theory. Solutions to Midterm Exam (Spring 2006)

2 STATISTICALLY OPTIMAL TRAINING DATA 2.1 A CRITERION OF OPTIMALITY We revew the crteron of statstcally optmal tranng data (Fukumzu et al., 1994). We

Markov Chain Monte Carlo Lecture 6

Learning undirected Models. Instructor: Su-In Lee University of Washington, Seattle. Mean Field Approximation

Why feed-forward networks are in a bad shape

Computation of Higher Order Moments from Two Multinomial Overdispersion Likelihood Models

Generative and Discriminative Models. Jie Tang Department of Computer Science & Technology Tsinghua University 2012

STATS 306B: Unsupervised Learning Spring Lecture 10 April 30

Ensemble Methods: Boosting

Deep Learning for Causal Inference

On Autoencoders and Score Matching for Energy Based Models

The Basic Idea of EM

Chapter 11: Simple Linear Regression and Correlation

Neural-Network Quantum States

Solving Nonlinear Differential Equations by a Neural Network Method

10-701/ Machine Learning, Fall 2005 Homework 3

arxiv: v3 [cs.ne] 18 Jan 2019

1 Motivation and Introduction

ANSWERS. Problem 1. and the moment generating function (mgf) by. defined for any real t. Use this to show that E( U) var( U)

Evaluation of classifiers MLPs

Introduction to Hidden Markov Models

Using T.O.M to Estimate Parameter of distributions that have not Single Exponential Family

CIS526: Machine Learning Lecture 3 (Sept 16, 2003) Linear Regression. Preparation help: Xiaoying Huang. x 1 θ 1 output... θ M x M

Chapter 7 Channel Capacity and Coding

Learning from Data 1 Naive Bayes

ADVANCED MACHINE LEARNING ADVANCED MACHINE LEARNING

C4B Machine Learning Answers II. = σ(z) (1 σ(z)) 1 1 e z. e z = σ(1 σ) (1 + e z )

Probabilistic Classification: Bayes Classifiers. Lecture 6:

Artificial Intelligence Bayesian Networks

A Bayes Algorithm for the Multitask Pattern Recognition Problem Direct Approach

Marginal Effects in Probit Models: Interpretation and Testing. 1. Interpreting Probit Coefficients

2E Pattern Recognition Solutions to Introduction to Pattern Recognition, Chapter 2: Bayesian pattern classification

Dr. Shalabh Department of Mathematics and Statistics Indian Institute of Technology Kanpur

15-381: Artificial Intelligence. Regression and cross validation

MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.265/15.070J Fall 2013 Lecture 12 10/21/2013. Martingale Concentration Inequalities and Applications

Transcription:

Usng deep belef network modellng to characterze dfferences n bran morphometry n schzophrena Walter H. L. Pnaya * a ; Ary Gadelha b ; Orla M. Doyle c ; Crstano Noto b ; André Zugman d ; Qurno Cordero b, e ; Andrea P. Jackowsk b ; Rodrgo A. Bressan b ; João R. Sato a, b * a Center of Mathematcs, Computaton, and Cognton. Unversdade Federal do ABC, Santo André, Brazl. b Department of Psychatry. Unversdade Federal de São Paulo, São Paulo, Brazl. c Department of Neuromagng, Insttute of Psychatry, Psychology and Neuroscence, Kng s College London, London, Unted Kngdom d Interdscplnary Lab for Clncal Neuroscences (LNC), Unversdade Federal de Sao Paulo, Sao Paulo, Brazl; e Department of Psychatry, Faculdade de Cêncas Médcas da Santa Casa de São Paulo, São Paulo, Brazl. * a Rua Arcturus, 03 - Jardm Antares, São Bernardo do Campo - SP, CEP 09.606-070, Brazl. b Rua Borges Lagoa, 570 Vla Clementno, São Paulo - SP, CEP 04.038-020, Brazl. c Insttute of Psychatry (PO89), Kng s College London, De Crespgny Park, London SE5 8AF, UK d Rua Borges Lagoa, 570 Vla Clementno, São Paulo SP, CEP: 04.038-020, Brazl. e Rua Maor Maraglano, 241 - Vla Marana, São Paulo - SP, CEP 04.017-030, Brazl. Correspondng Author: Walter H. L. Pnaya Phone: +55 11 97123 0508 Emal address: walhugolp@gmal.com

Supplementary nformaton Deep Belef Networks The deep learnng method that we used n ths study conssted of a deep neural network pre-traned by a DBN (DBN-DNN). The DBN has ganed popularty snce the successful mplementaton of an effcent learnng technque that stacks smpler models known as restrcted Boltzmann machne (RBM) 6. Restrcted Boltzmann Machne The RBM can be nterpreted as an artfcal neural network that extracts latent features of the nput unknown probablty dstrbuton based only on observed samples 19. Gven some observatons, tranng an RBM means adustng the model parameters such that the probablty dstrbuton represented by t fts the dstrbuton of the tranng data as well as possble. The RBM network conssts of a bpartte graph that has a vsble layer and a hdden layer (Fg. 1). The RBM can be defned as an energy-based model, and the ont probablty dstrbuton of hdden unt values h and vsble unt values v s determned usng an energy functon E (1). Fgure 1. Restrcted Boltzmann Machne (RBM). The graph of an RBM has only connectons between the layer of hdden (gray crcles) and vsble varables (whte crcles) but not between two unts

of the same layer. Ths means that the hdden unts are ndependent of each other gven the state of the vsble unts and vce versa. P 1 equaton (1) Z v, h exp Ev, h Z exp E v,h equaton (2) v h where the normalzng constant Z s called the partton functon by analogy wth physcal systems. The partton functon s obtaned by summng over all possble pars of vsble and hdden vectors (2). The RBM hdden unts are typcally treated as bnary stochastc unts (wth a Bernoull dstrbuton). The vsble layer can also handle bnary data dstrbuton wth Bernoull unts. However, the RBM can also handle contnuous data dstrbuton (lke the morphometrc data) wth vsble Gaussan unts. These unts condtonally follow a Gaussan dstrbuton whch mean s determned by the weghted sum of the states of the hdden unts. The RBM that uses ths type of vsble unt s called as Gaussan- Bernoull RBM (GRBM). The GRBMs can be used to convert real-valued varables of DBN nput layer to bnary stochastc varables, whch can then be treated usng the Bernoull-Bernoull RBMs. Thus, the energy functon n the Bernoull-Bernoull RBM s defned by:, h v c h E v h equaton (3) b vw, The energy functon of GRBM can be defned by:

1 2 Ev, h bv c h h equaton (4) 2 v vw, where b and c are the bas of vsble unt and hdden unt, respectvely, and W, s the weght parameter of the model connectons. The obectve of tranng s to ft the probablty dstrbuton model over a set of vsble random varables v to the observed data. Thus, the tranng process can be operated by maxmum lkelhood estmaton method for the margnal probablty v v, h P P h. The gradent of the lkelhood on the RBM parameters (weghts and bases) has a closed form. However, t ncludes an ntractable expectaton over the ont dstrbuton of vsble and hdden P(v,h). Usually, an approxmaton of the gradent s used to deal wth ths ntractable expectaton problem. A truncated verson of Gbbs samplng method called Contrastve Dvergence (CD) 6 uses the condtonal probablty, P(v h) and P(h v) n the approxmaton. The popularty of the RBM stems from CD effcent algorthm and from the ablty to calculate condtonal dstrbutons over v and h easly. The condtonal probabltes of the RBM can be computed as: P h 1 c vw, v equaton (5) P h equaton (6) v 1 σ b h W, become: Smlarly, for a GRBM, the correspondng condtonal probablty of vsble unts

h equaton (7) v 1 N b h W ; 1 P, where the logstc sgmod functon ((x)=1/(1+e -x )), and the normal dstrbuton s denoted by N(mean;varance). Further nformaton on RBM model and tranng can be found n 6,19. Creatng Deep Belef Networks After tranng, the hdden unt values of RBM provde a closed-form representaton of the dependences between the vsble unts. The dea s that the hdden unts extracted relevant features from the observatons. However, these features are regarded as low-level features. To acheve more complex representatons, the model needs to calculate the hgher-level features based on the lower-level ones. So, we create a DBN by stackng RBMs 6. The stackng procedure s descrbed as follows. After tranng a GRBM wth the contnuous nput data, we treat the actvaton probabltes of ts hdden unts as the nput data to tran the Bernoull Bernoull RBM one layer up. Smlarly, the hdden unts actvaton probabltes of the second-layer RBM are used as nput for next RBM, and so on untl reachng the desred depth. By stackng RBMs, the DBN can learn a herarchcal structure of the nput data. Ths pre-tranng can be followed by a dscrmnatve tranng that fne-tunes all layers ontly to perform the classfcaton task. Ths fne-tunng s done by ntatng the parameters of a deep neural network wth the values of DBN pre-traned parameters. Besdes that, fnal layer (composed of softmax unts) s added to

mplement the desred targets of the tranng data, the labels SCZ and HC. Fnally, the backpropagaton algorthm and a gradent-based optmzaton algorthm can be used to adust the network parameters, creatng a DBN-DNN. Detaled nformaton of the selecton of the DBN-DNN optmal models Table 1 The AUC-ROC of the DBN-DNN classfers durng the search for the optmal number of hdden layers. # Cross valdaton 1 Layer 2 Layers 3 Layers 4 Layers 5 Layers 1 1 0.8697 0.8889 0.8640 0.8649 0.8640 2 0.8067 0.7858 0.8008 0.8392 0.6892 3 0.8778 0.8704 0.8269 0.8417 0.8093 2 1 0.7339 0.7688 0.7839 0.6491 0.7304 2 0.8121 0.8030 0.8924 0.7441 0.8076 3 0.7294 0.7301 0.6934 0.7902 0.7441 3 1 0.9174 0.9104 0.9132 0.7692 0.9062 2 0.8269 0.8278 0.8295 0.7631 0.7019 3 0.7540 0.7692 0.7596 0.7917 0.7628 4 1 0.7738 0.8185 0.7554 0.7900 0.7677 2 0.7750 0.8033 0.8383 0.7600 0.7875 3 0.7617 0.7200 0.7258 0.7139 0.7200 5 1 0.7950 0.6091 0.7723 0.6273 0.7662 2 0.7304 0.7441 0.7308 0.7471 0.7981 3 0.7662 0.7628 0.7485 0.7500 0.6371 Mean 0.7953 0.7875 0.7957 0.7628 0.7661 Standard devaton 0.0570 0.0747 0.0639 0.0651 0.0681