Using deep belief network modelling to characterize differences in brain morphometry in schizophrenia

Usng deep belef network modellng to characterze dfferences n bran morphometry n schzophrena Walter H. L. Pnaya * a ; Ary Gadelha b ; Orla M. Doyle c ; Crstano Noto b ; André Zugman d ; Qurno Cordero b, e ; Andrea P. Jackowsk b ; Rodrgo A. Bressan b ; João R. Sato a, b * a Center of Mathematcs, Computaton, and Cognton. Unversdade Federal do ABC, Santo André, Brazl. b Department of Psychatry. Unversdade Federal de São Paulo, São Paulo, Brazl. c Department of Neuromagng, Insttute of Psychatry, Psychology and Neuroscence, Kng s College London, London, Unted Kngdom d Interdscplnary Lab for Clncal Neuroscences (LNC), Unversdade Federal de Sao Paulo, Sao Paulo, Brazl; e Department of Psychatry, Faculdade de Cêncas Médcas da Santa Casa de São Paulo, São Paulo, Brazl. * a Rua Arcturus, 03 - Jardm Antares, São Bernardo do Campo - SP, CEP 09.606-070, Brazl. b Rua Borges Lagoa, 570 Vla Clementno, São Paulo - SP, CEP 04.038-020, Brazl. c Insttute of Psychatry (PO89), Kng s College London, De Crespgny Park, London SE5 8AF, UK d Rua Borges Lagoa, 570 Vla Clementno, São Paulo SP, CEP: 04.038-020, Brazl. e Rua Maor Maraglano, 241 - Vla Marana, São Paulo - SP, CEP 04.017-030, Brazl. Correspondng Author: Walter H. L. Pnaya Phone: +55 11 97123 0508 Emal address: walhugolp@gmal.com

Supplementary nformaton Deep Belef Networks The deep learnng method that we used n ths study conssted of a deep neural network pre-traned by a DBN (DBN-DNN). The DBN has ganed popularty snce the successful mplementaton of an effcent learnng technque that stacks smpler models known as restrcted Boltzmann machne (RBM) 6. Restrcted Boltzmann Machne The RBM can be nterpreted as an artfcal neural network that extracts latent features of the nput unknown probablty dstrbuton based only on observed samples 19. Gven some observatons, tranng an RBM means adustng the model parameters such that the probablty dstrbuton represented by t fts the dstrbuton of the tranng data as well as possble. The RBM network conssts of a bpartte graph that has a vsble layer and a hdden layer (Fg. 1). The RBM can be defned as an energy-based model, and the ont probablty dstrbuton of hdden unt values h and vsble unt values v s determned usng an energy functon E (1). Fgure 1. Restrcted Boltzmann Machne (RBM). The graph of an RBM has only connectons between the layer of hdden (gray crcles) and vsble varables (whte crcles) but not between two unts

of the same layer. Ths means that the hdden unts are ndependent of each other gven the state of the vsble unts and vce versa. P 1 equaton (1) Z v, h exp Ev, h Z exp E v,h equaton (2) v h where the normalzng constant Z s called the partton functon by analogy wth physcal systems. The partton functon s obtaned by summng over all possble pars of vsble and hdden vectors (2). The RBM hdden unts are typcally treated as bnary stochastc unts (wth a Bernoull dstrbuton). The vsble layer can also handle bnary data dstrbuton wth Bernoull unts. However, the RBM can also handle contnuous data dstrbuton (lke the morphometrc data) wth vsble Gaussan unts. These unts condtonally follow a Gaussan dstrbuton whch mean s determned by the weghted sum of the states of the hdden unts. The RBM that uses ths type of vsble unt s called as Gaussan- Bernoull RBM (GRBM). The GRBMs can be used to convert real-valued varables of DBN nput layer to bnary stochastc varables, whch can then be treated usng the Bernoull-Bernoull RBMs. Thus, the energy functon n the Bernoull-Bernoull RBM s defned by:, h v c h E v h equaton (3) b vw, The energy functon of GRBM can be defned by:

1 2 Ev, h bv c h h equaton (4) 2 v vw, where b and c are the bas of vsble unt and hdden unt, respectvely, and W, s the weght parameter of the model connectons. The obectve of tranng s to ft the probablty dstrbuton model over a set of vsble random varables v to the observed data. Thus, the tranng process can be operated by maxmum lkelhood estmaton method for the margnal probablty v v, h P P h. The gradent of the lkelhood on the RBM parameters (weghts and bases) has a closed form. However, t ncludes an ntractable expectaton over the ont dstrbuton of vsble and hdden P(v,h). Usually, an approxmaton of the gradent s used to deal wth ths ntractable expectaton problem. A truncated verson of Gbbs samplng method called Contrastve Dvergence (CD) 6 uses the condtonal probablty, P(v h) and P(h v) n the approxmaton. The popularty of the RBM stems from CD effcent algorthm and from the ablty to calculate condtonal dstrbutons over v and h easly. The condtonal probabltes of the RBM can be computed as: P h 1 c vw, v equaton (5) P h equaton (6) v 1 σ b h W, become: Smlarly, for a GRBM, the correspondng condtonal probablty of vsble unts

h equaton (7) v 1 N b h W ; 1 P, where the logstc sgmod functon ((x)=1/(1+e -x )), and the normal dstrbuton s denoted by N(mean;varance). Further nformaton on RBM model and tranng can be found n 6,19. Creatng Deep Belef Networks After tranng, the hdden unt values of RBM provde a closed-form representaton of the dependences between the vsble unts. The dea s that the hdden unts extracted relevant features from the observatons. However, these features are regarded as low-level features. To acheve more complex representatons, the model needs to calculate the hgher-level features based on the lower-level ones. So, we create a DBN by stackng RBMs 6. The stackng procedure s descrbed as follows. After tranng a GRBM wth the contnuous nput data, we treat the actvaton probabltes of ts hdden unts as the nput data to tran the Bernoull Bernoull RBM one layer up. Smlarly, the hdden unts actvaton probabltes of the second-layer RBM are used as nput for next RBM, and so on untl reachng the desred depth. By stackng RBMs, the DBN can learn a herarchcal structure of the nput data. Ths pre-tranng can be followed by a dscrmnatve tranng that fne-tunes all layers ontly to perform the classfcaton task. Ths fne-tunng s done by ntatng the parameters of a deep neural network wth the values of DBN pre-traned parameters. Besdes that, fnal layer (composed of softmax unts) s added to

mplement the desred targets of the tranng data, the labels SCZ and HC. Fnally, the backpropagaton algorthm and a gradent-based optmzaton algorthm can be used to adust the network parameters, creatng a DBN-DNN. Detaled nformaton of the selecton of the DBN-DNN optmal models Table 1 The AUC-ROC of the DBN-DNN classfers durng the search for the optmal number of hdden layers. # Cross valdaton 1 Layer 2 Layers 3 Layers 4 Layers 5 Layers 1 1 0.8697 0.8889 0.8640 0.8649 0.8640 2 0.8067 0.7858 0.8008 0.8392 0.6892 3 0.8778 0.8704 0.8269 0.8417 0.8093 2 1 0.7339 0.7688 0.7839 0.6491 0.7304 2 0.8121 0.8030 0.8924 0.7441 0.8076 3 0.7294 0.7301 0.6934 0.7902 0.7441 3 1 0.9174 0.9104 0.9132 0.7692 0.9062 2 0.8269 0.8278 0.8295 0.7631 0.7019 3 0.7540 0.7692 0.7596 0.7917 0.7628 4 1 0.7738 0.8185 0.7554 0.7900 0.7677 2 0.7750 0.8033 0.8383 0.7600 0.7875 3 0.7617 0.7200 0.7258 0.7139 0.7200 5 1 0.7950 0.6091 0.7723 0.6273 0.7662 2 0.7304 0.7441 0.7308 0.7471 0.7981 3 0.7662 0.7628 0.7485 0.7500 0.6371 Mean 0.7953 0.7875 0.7957 0.7628 0.7661 Standard devaton 0.0570 0.0747 0.0639 0.0651 0.0681