Sparsely Connected Autoencoder

Size: px

Start display at page:

Download "Sparsely Connected Autoencoder"

Grace Skinner
5 years ago
Views:

1 Sprsely Connected Autoencoder Kvy Gupt III Delh New Delh, Ind Angshul Mjumdr III Delh New Delh, Ind Abstrct hs wor proposes to lern utoencoders wth sprse connectons. Pror studes on utoencoders enforced sprsty on the neuronl ctvty; these re dfferent from our proposed pproch we lern sprse connectons. Sprsty n connectons helps n lernng (nd eepng) the mportnt reltons whle trmmng the rrelevnt ones. e hve tested the performnce of our proposed method on two tss clssfcton nd denosng. or clssfcton we hve compred gnst stced utneencoders, contrctve utoencoders, deep belef networ, sprse deep neurl networ nd optml brn dmge neurl networ; the denosng performnce ws compred gnst denosng utoencoder nd sprse (ctvty) utoencoder. In both the tss our proposed method yelds superor results. Index erms utoencoder, clssfcton, sprsty, denosng I. INRODUCION here s plethor of wor n sprse neurl networs. Brodly t cn be segregted nto ) sprse ctvty nd ) sprse connectvty. Sprsty cn rse n two contexts. he sprse ctvty property mens tht only smll frcton of neurons s ctve t ny tme. he sprse connectvty property mens tht ech neuron s connected to only lmted number of other neurons. Snce ts onset, neurl networs hve been clmed to mmc the humn brn. or certn ctvty / ts only porton of the brn (neurons) re ctve. he whole brn s never used. In fct, t s wdely crculted myth tht we use only 10% of our brn; the myth s untrue. But t s well nown tht only certn porton of the brn s ctve for certn ts;.e. gven the whole brn only sprse set of neurons re ctully ctve (for the gven ts). herefore, f ndeed the neurl networ s n pproxmte representtve of the brn, we would expect to hve sprse connectons. hs spect hs been cptured by LeCun s wor on optml brn dmge [1]. He devsed technque to trm connectons of neurl networ (hence the nme brn dmge) wthout degrdng ts performnce. In recent tmes the de hs been revsted. In [, 3] sprsty s enforced both on the ctvty nd on the connectvty. In [4] sprsty on ctvty ws promoted for rectfer neurl networ; t ws used n [5] to mprove the performnce on phone recognton. Sprsty n connectons ws exploted n [6] for the problem of speech recognton. In [7] nd [8] sprsty n connectons s enforced on convolutonl neurl networ nd recurrent neurl networ respectvely. In most of these studes the common observton s tht ntroducton of sprsty leds to slght dp n performnce but reduces the complexty of the networ sgnfcntly. In ths wor we re specfclly nterested n utoencoders. Although there hs been lot of wor on sprse connectvty nd sprse ctvty on neurl networs, ll pror studes n sprse utoencoders enforced sprsty on the ctvty; there s no pror study tht promoted sprsty n connectons. hs s the frst wor to do so. In [9] sprsty ws ntroduced n terms of frng neurons. If the neurons re of hgh vlue (ner bout 1), t s llowed to be fred, the rest re not. In [10], only the top K hgh vlued neurons re fred; n [11] only the neurons beyond predefned threshold were fred. In [1], comprson of dfferent sprsty promotng terms (on ctvtes) were compred; these were the KL dvergence [9] nd vrtons of l 1-norm [11]. It ws shown n [13] tht by combnng the output of severl such sprse utoencoders (trned s n [9]), one s ble to mprove performnce of severl mge recognton tss. e compre our proposed sprsely connected utoencoder wth severl vrnts of utoencoders (for clssfcton nd denosng) nd deep belef networ (for clssfcton). e fnd tht our proposed technque yelds better results compred to exstng technques. Although most of the reders of ths pper wll be brest wth lterture on neurl networs n generl nd utoencoders n prtculr, we provde bref revew of utoencoders for the se of completeness n the next secton. In secton 4, our proposed method s descrbed n detl. he expermentl evluton s reported n secton 5. he conclusons of ths wor s dscussed n secton 6. Input Lyer II. BACKGROUND Hdden Lyer g. 1. Sngle Lyer Autoencoder Output Lyer

2 An uto encoder (s seen n g. 1) conssts of two prts the encoder mps the nput to ltent spce, nd the decoder mps the ltent representton to the dt. or gven nput vector (ncludng the bs term) x, the ltent spce s expressed s: h x (1) Here the rows of re the ln weghts from ll the nput nodes to the correspondng ltent node. he mppng cn be lner, but n most cses t s non-lner (sgmod, tnh etc.): () he decoder porton reverse mps the ltent fetures to the dt spce. x ' ( x) (3) Snce the dt spce s ssumed to be the spce of rel numbers, there s no sgmodl functon here. Durng trnng, the problem s to lern the encodng nd decodng weghts nd. hs s cheved by mnmzng the Euclden cost: h ( x), ' rg mn X ' ( X ) (4) Here X [ x1... x N ] conssts ll the trnng smpled stced s columns. he problem (4) s clerly non-convex. However, t s solved esly by grdent descent technques snce the sgmod functon s smooth nd contnuously dfferentble. Input Lyer Hdden Lyer 1 g.. Stced Autoencoder Hdden Lyer L here re severl extensons to the bsc utoencoder rchtecture. Stced / Deep utoencoders [9] hve multple hdden lyers (see g. ). he correspondng cost functon s expressed s follows:..., '... ' 1 L1 1 rg mn X g f ( X ) (5) L where g 1' '... L ' f ( X ) f... ( X ). L1 L 1 nd Solvng the complete problem (5) s computtonlly chllengng. he weghts re usully lerned n greedy fshon one lyer t tme [14]. Stced denosng utoencoders (SDAE) [15] re vrnt of the bsc utoencoder where the nput conssts of nosy smples nd the output conssts of clen smples. Here the encoder nd decoder re lernt to denose nosy nput smples. he lerned fetures pper to be more robust when lernt by SDAE. Output Lyer In recent wor mrgnlzed denosng utoencoder ws proposed [16], whch does not hve ny ntermedte nodes but lerns the mppng from the nput to the output. hs formulton s convex (unle regulr utoencoders); the trc here s to mrgnlze over ll possble nosy smples so tht the dtset need not be ugmented le SDAE. Such n utoencoder ws used for domn dptton. Another vrton for the bsc utoencoder s to regulrze t,.e. ( ) s rg mn X g f ( X ) R(, X ) (6) he regulrzton cn be smple honov regulrzton however tht s not used n prctce. It cn be sprsty promotng term [9]-[11] or weght decy term (robenus norm of the Jcobn) s used n the contrctve utoencoder [17]. he regulrzton term s usully chosen so tht they re dfferentble nd hence mnmzed usng grdent descent technques. III. PROPOSED SPARSECONNEC AUOENCODER Autoencoders usully hve non-lner ctvton functon. However, n [18] t ws shown tht n utoencoder usully opertes n the lner regon. herefore n ths wor, we wll use lner ctvton functon. hs llows us to derve more effcent lgorthm whch s fster thn ts non-lner counterprts. e lso show (expermentlly) tht the lner utoencoder yelds better results thn ts non-lner counterprt. he bsc formulton of n utoencoder wth lner ctvton functon s gven by: rg mn ', X ' X (7) he bsc utoencoder s prone to overfttng; especlly when the number of trnng smples s lmted. Denosng utoencoders use stochstc regulrzton technque. However, gven the Euclden cost functon of the utoencoder more drect wy to regulrze t would be ncorporte penlty terms to the bsc formulton. or exmple, contrctve utoencoder wth lner ctvton functon would led to the followng formulton: ', rg mn X ' X ' (8) he robenus norm on the weghts regulrzes the networ to hve smll vlues. he regulrzton prevents overfttng of the networ. e propose to regulrze the utoencoder such tht t hs sprse connectons both t the encoder nd the decoder. he de of trmmng rrelevnt connectons n neurl networs s not new; t ws frst proposed bc n 1990 n the form of optml brn dmge [1]. In recent tmes, lernng sprse structures n neurl networs hs gned momentum [-8]. However, to the best of our nowledge, there s no wor tht lerns utonencoders wth sprse connectons. Pror studes n sprse utoencoder [9-1] concentrte on sprse ctvtes; not on sprse connectons. In ths respect ours s the frst wor to propose sprsely connected utoencoders.

3 Just s humn brn does not requre ll ts neurons for specfc ts, we postulte tht n utoencoder does not need to utlze ll ts connectons ether. he ssue of mntnng mportnt connectons wthout over-fttng s ten cre of, f we hve sprse weghts. he portons whch re not useful for representton re pruned nd only the mportnt connectons n the networ re mntned. Such sprse connecton s esly cheved from the followng proposed formulton, ', rg mn X ' X ' (9) 1/0 1/0 e hve bused the notton bt, the subscrpt 1/0 denotes ether n l 1-norm or n l 0-norm nd s defned on the vectorl representton of the weghts. he l 1-norm s convex, nd hs been wdely used n recent tmes by Compressed Sensng [19], [0]. But the l 0-norm does not delly yeld sprse weghts, the l 0-norm does. Unfortuntely l 0-norm mnmzton s n NP hrd problem [1]. However there re pproxmte technques to solve such l 0-mnmzton problems. Autoencoders wth non-lner ctvton functon re solved usng grdent descent technques. Such technques cnnot be drectly pplcble for our proposed formulton. hs s becuse the l 1/l 0-norm penltes re not dfferentble everywhere. In ths wor we follow Mjorzton Mnmzton pproch to solve the sd problem. A. Mjorzton Mnmzton () (b) (c) g. 3. Mjorzton-Mnmzton [] g. 3 shows the geometrcl nterpretton behnd the Mjorzton-Mnmzton (MM) pproch. he fgure depcts the soluton pth for smple sclr problem but essentlly cptures the MM de. Let, J(x) be the functon to be mnmzed. Strt wth n ntl pont (t =0) x (g. 3). A smooth functon G (x) s constructed through x whch hs hgher vlue thn J(x) for ll vlues of x prt from x, t whch the vlues re the sme. hs s the Mjorzton step. he functon G (x) s constructed such tht t s smooth nd esy to mnmze. At ech step, mnmze G (x) to obtn the next terte x +1 (g. 3b). A new G +1(x) s constructed through x +1 whch s now mnmzed to obtn the next terte x + (g. 3c). As cn be seen, the soluton t every terton gets closer to the ctul soluton. or convenence we express the problem (9) n slghtly dfferent mnner n terms of trnsposes rg mn X X H R( H ) (10) H Here H= nd R(H) denotes the penlty. Only the lest squres prt need to be mjorzed; the penlty terms re not ffected. J ( H ) X X H R( H ) (11) or ths mnmzton problem, G (x), the mjorzer of J(x) s chosen to be, G ( H ) X X H ( H H ) ( I XX )( H H ) (1) where s the mxmum egenvlue of the mtrx XX nd I s the dentty. One cn chec tht t H=H the expresson G (H) reduces to J(H). At ll other ponts t s lrger thn J(H); the vlue of ssures tht the second term s postve defnte. G ( H ) X X H ( ) H H I XX H H R( H ) XX XX H HXX H H H I XX H H R( H ) R( H ) ( BH HH ) C R( H ) XX H I XX H XX H I XX H HH 1 where B H X ( X X H ), C XX H ( I XX ) H Usng the dentty wrte, G ( H ) B H B B C R( H ) A D A A A D D D, one cn B H R( H ) K where K conssts of terms ndependent of x. herefore, mnmzng G (x) s the sme s mnmzng the followng, ' 1 G ( H ) B H R( H ) (13)

4 1 where B H X ( X X H ). hs updte s nown s the Lndweber terton. B. l 1-norm penlty rst we derve the lgorthm for solvng the l 1-norm mnmzton problem. rg mn B ' ' 1 1 (14) ', hs s blner problem nd we propose to solve t lterntely,.e. we fx nd solve nd then solve ssumng s fxed. hese two steps re done n every terton. 1 rg mn B ' 1 (15) ' 1 rg mn B ' 1 ' (15b) 1 ' In both cses, the problem remns the sme tht of lest squres mnmzton wth l 1-norm penlty. Let us te the frst problem nd wor out the soluton for t; the soluton for the other problem wll remn the sme. o solve (15) we nvoe the mjorzton pproch once gn. herefore (15) cn be expressed s, rg mn P (16) 1 1 where P ' B ', α s the mxmum egenvlue of ' ' he bove functon (16) s ctully de-coupled,.e. P P 1 (17) herefore, (17) cn be mnmzed term by term,.e. P 1 P sgnum( ) g. 4. Soft hreshold Rule (wth τ=) (18) Settng the prtl dervtves to zero nd solvng for gves the grph shown n g. 4 wth threshold. ht s, the mnmzer of (16) s obtned by pplyng the soft-threshold rule to P wth threshold. he soft-threshold rule s the non-lner functon defned s, x x soft( x, ) 0 x (19) x x Or more compctly, sgnum( P) mx(0, P ) (0) hs concludes the steps for solvng (15); the steps for (15b) re exctly the sme. In compct fshon, the lgorthm for solvng the l 1-norm penlty problem s gven s: Intlzton: H rg mn X X H H H USV 0' US nd 0 V In every terton 1 Compute B H X ( X X H) Updte 1 rg mn B ' 1 1 P ' B ' 1 sgnum( P) mx(0, P ) Smlrly updte ' 1 rg mn B ' 1 ' 1 ' Our ntlzton s determnstc, hence the results re results re repetble there s no vrton between trls s long s other prmeters remn sme. C. l 0-norm penlty he l 1-norm penlty s bsclly shrnge functon defned by the soft thresholdng. It cnnot get n exctly sprse soluton, t only shrns the vlues of unwnted weghts. o get sprse soluton n every terton, one needs to solve the l 0- norm mnmzton problem. hs s n NP hrd problem but hs pproxmte solutons. he more common prctce s to solve (40) usng greedy pproch bsed on Orthogonl Mtchng Pursut [3], [4]. However, these re not effcent for solvng lrge scle problems. o solve -sprse problem, tertons re requred. A better pproch to solve (1) s bsed on Itertve Hrd hresholdng [5]. rg mn B ' ' 0 0 (1) ', e solve t v lterntng mnmzton. 1 rg mn B ' 0 () ' 1 rg mn B ' 1 ' (b) 0 '

5 As before, both the problems remn the sme. e only derve the lgorthm to solve (). o solve t, we nvoe the mjorzton pproch once gn. herefore () cn be expressed s, rg mn P (3) 0 1 where P ' B ' hs s decoupled problem nd cn be expressed s, 0 P ( P ) 1 (4) 0... ( Pn n ) n e cn process (4) element-wse. o derve the mnmum, two cses need to be consdered: cse 1 nd cse. he element-wse cost s 0 n the frst cse. or the second cse, the mnmum s reched when. Comprng the cost on both cses, 0 f 0 0 P 0 ( ) f P hs suggests the followng updtes rule, P when P / 0 when P / hs s populrly nown s hrd thresholdng nd s represented s: 1 Hrdh P, (5) hs leds to n lgorthm somewht smlr to the prevous one. It s succnctly represented below. Intlzton: H rg mn X X H H H USV 0' US nd 0 V In every terton 1 Compute B H X ( X X H) Updte 1 rg mn B ' 0 1 P ' B ' 1 HrdhP, Smlrly updte ' 1 rg mn B ' 1 ' 0 ' IV. EXPERIMENAL EVALUAION he MNIS dgt clssfcton ts s composed of 8x8 mges of the 10 hndwrtten dgts. here re 60,000 trnng mges wth 10,000 test mges n ths benchmr. he mges re scled to [0,1] nd we do not perform ny other preprocessng. Experments re lso crred out on the more chllengng vrtons of the MNIS dtset. hese hve been used n [11] mong others nd were ntroduced s benchmr deep lernng dtsets. All these dtsets hve 1,000 trnng (we do not need vldton) nd 50,000 test smples. he sze of the mge s before s 8x8 nd the number of clsses re 10. Dtset bsc bsc-rot bg-rnd bg-mg Descrpton Smller subset of MNIS. Smller subset of MNIS wth rndom rottons. Smller subset of MNIS wth unformly dstrbuted rndom nose bcground. Smller subset of MNIS wth rndom mge bcground. bg-mg-rot Smller subset of MNIS dgts wth rndom bcground mge nd rotton. e hve lso evluted on the problem of clssfyng documents nto ther correspondng topc. e hve used verson of the 0- dtset [6] for whch the trnng nd test sets contn documents collected t dfferent tmes, settng tht s more reflectve of prctcl pplcton. he trnng set conssts of 11,69 smples nd the test set contns 7,505 exmples. e hve used 5000 most frequent words for the bnry nput fetures. e follow the sme protocol s outlned n [7]. Our thrd dtset s the GZAN musc genre dtset [8, 9]. he dtset contns three-second udo clps, eqully dstrbuted mong 10 muscl genres: blues, clsscl, country, dsco, hp-hop, pop, jzz, metl, regge nd roc. Ech exmple n the set s represented by 59 Mel-Phon Coeffcent (MPC) fetures. hese re smplfed formulton of the Mel-frequency Cepstrl Coeffcents (MCCs) tht re shown to yeld better clssfcton performnce. Snce there s no predefned stndrd splt nd fewer exmples, we hve used 10-fold cross vldton (procedure mentoned n [15]), where ech fold conssted of 9000 (we do not requre vldton exmples unle [8]) trnng exmples nd 1000 test exmples. A. Lner vs Non-lner Most studes n neurl networs employ non-lner ctvton functon. e proposed lner ctvton owng to the ese of soluton. e wll show tht, tlest for the benchmr dtsets used n these experments, the smple lner (Identty) ctvton functon yelds better clssfcton ccurcy thn ther non-lner (sgmod) counterprt. he lner utoencoder weghts re ntlzed by solvng the lest squres problem, mn Q X QX nd settng s the top (number of nodes) rght sngulr vectors of Q. or the nonlner utoconder we use the Hnton s mplementton [30].

6 he utoencoder rchtectures remns sme otherwse; both (lner nd non-lner) re three lyer rchtectures wth hdden nodes. he representton from the deepest lyer s used for clssfcton. e employ two non-prmetrc clssfers KNN (K=1) nd Sprse Representton bsed Clssfcton (SRC) [31]. e wnt to test the representton / feture extrcton cpblty of the lner nd non-lner utoencoders; ths s best done usng smple non-prmetrc clssfers. Prmetrc clssfers le NN nd SVM my be fne tuned to yeld better results, but n such cse t s dffcult to guge f the mprovement n results s owng to the feture extrcton or owng to the fne tunng. ABLE I. LINEAR VS NON-LINEAR ACIVAION Dtset KNN SRC Lner Non-lner Lner Non-lner MNIS bsc bsc-rot bg-mg bg-rnd bg-mg-rot GZAN he results show tht the lner one lwys yelds better results. he mprovement s smll when the number of trnng smples re lrger but for the more chllengng dtsets, the lner utoencoder yelds mprove by lrge mrgn. B. Clssfcton Performnce e compre our results wth the stced utoencoder (SAE), Contrctve Autoencoder (CAE) nd Deep Belef Networ (DBN) [8]. he SAE nd CAE uses lner ctvton. As before, the representton from the deepest lyer s used s fetures. or our sprsely connected utoencoder λ=0.01 s used. In ech of the tbles, the best results re shown n bold. ABLE II. KNN (K=1) RESULS ABLE III. SRC RESULS ABLE IV. SVM RESULS ABLE V. COMPARAIVE RESULS Dtset SAE CAE DBN l0-norm l1-norm MNIS bsc bsc-rot bg-rnd bg-mg bg-mg-rot GZAN Dtset SAE CAE DBN l0-norm l1-norm MNIS bsc bsc-rot bg-rnd bg-mg bg-mg-rot GZAN Dtset SAE CAE DBN l0-norm l1-norm MNIS bsc bsc-rot bg-rnd bg-mg bg-mg-rot GZAN he results show tht our proposed method yelds better results thn SAE nd DBN on n ll the dtsets (except for the lrger MNIS wth KNN). Under fr comprson (eepng the clssfers to be sme nd non-prmetrc) one cn sy tht our method yelds better representton thn other deep lernng technques le SAE, CAE nd DBN. Next (ble IV) we compre the results wth stced denosng utoencoder (SDAE), deep belef networ (DBN), sprse deep neurl networ (SDNN) [33] nd optml brn dmge (OBD) [1]. e repet the results from l 0-norm sprsely connected utoencoder wth SRC (snce these re the best results we obtned). SDAE nd DBN uses fne tunng wth neurl networ clssfer n the fnl stge. SDNN s contemporry sprse deep clssfer nd OBD s clsscl wor wth shllow rchtecture. he results show tht our method yelds results whch re t pr wth SDAE nd DBN nd re better thn sprse neurl networs. Dtset SDAE DBN SDNN OBD MNIS bsc bsc-rot bg-rnd bg-mg bg-mg-rot GZAN e hve compred the trnng tmes of dfferent utoencoders nd DBN. he results re shown n ble V. Both re proposed methods re sgnfcntly fster thn the others. he computtonl cost per terton s hgher for us, but the lgorthms converge fster. he results re only shown on the lrge MNIS dtset nd the MNIS bsc.

7 ABLE VI. RAINING IME IN MINUES Dtset MNIS bsc DBN 78 1 Stced Autoencoder (non-lner) Stced Autoencoder (lner) Contrctve Autoencoder (lner) 98 4 l 1-norm 4 1 l 0-norm 50 6 he confgurton of the mchne runnng these experments s: RAM- 4 GB OS- Red Ht Enterprse Lnux Server relese 7.0 (Mpo) CPU - Intel(R) Xeon(R) CPU E5-430 re two cpus of 6 cores ech Smulton on Mtlb R014. C. Denosng Results Autoencoders hve been used prevously for mge denosng. In [11] t ws shown tht utoencoder wth sprse fetures leds to good denosng results. hey showed results for Gussn nd mpulse denosng. It s not optml to remove mpulse nose wth utoencoders; ths s becuse mpulse nose s sprse. Snce we re formultng n utoencoder wth l - norm dt fdelty, we cn optmlly remove Gussn nose only; ths s the problem we ddress n ths wor. or comprson, we use the stndrd metrcs for mge qulty ssessment - PSNR (Pe Sgnl to Nose Rto) nd SSIM (Structurl Smlrty Index) [34]. e compre our pproch (SprseConnect) wth the sprse utoencoder [11] nd stced denosng utoencoder (SDAE). e use sngle lyer utoencoders for mge denosng. he number of nodes n the hdden lyer s ept to be 51. he vlue of λ s e crred out experments on the gryscle CIAR-10 dtset. he CIAR-10 dtset (g. 10) s composed of 10 clsses of nturl mges wth 50,000 trnng exmples n totl, 5,000 per clss. Ech mge s of sze 3x3. or these experments the color mges hve been converted to greyscle. Zero men Gussn nose ws dded to these mges. he nosy mges served s the nput to the utoencoders nd the clen mges were the output. or testng, the nosy test mges were s nputs nd the mge obtned t the output ws compred wth the clen mge to test the denosng performnce. he results re shown n the followng tble. he PSNR nd the SSIM vlues shown here re the mens over 10,000 test mges. Nose Vrnce, PSNR nd SSIM for Nosy Imge Vrnce=0.01, PSNR= SSIM= Vrnce=0.04 PSNR= SSIM=0.356 Vrnce=0.09 PSNR= SSIM=0.011 SDAE ABLE VII. DENOISING RESULS PSNR=1.95 SSIM= PSNR=1.66 SSIM= PSNR=1.30 SSIM= Sprse Autoencoder PSNR=.94 SSIM= PSNR=.63 SSIM= PSNR=.5 SSIM= SprseConnect (l 1-norm) PSNR=3.90 SSIM=0.738 PSNR=3.53 SSIM=0.707 PSNR=3.01 SSIM=0.675 SprseConnect (l 0-norm) PSNR=6.03 SSIM=0.805 PSNR=5.68 SSIM=0.798 PSNR=5.7 SSIM= he mprovement s sgnfcnt. Usully n mge denosng lterture PSNR mprovement of 0.5 db to 1 db s consdered to be good. In ths cse the mprovement s ner bout 3dB compred to the sprse denosng utoencoder. Also the mprovement n SSIM s round 0.1 ths s huge mprovement. or vsul evluton some smple mges re shown n g. 5. g. 5. Left to Rght Orgnl test mge, nosy mge, SDAE, Sprse Autoencoder, l 1-norm nd l 0-norm. he denosng results my not be t pr wth the stte-ofthe-rt le BM3D or KSVD, but re better thn competng utoencoder bsed technques. he proposed SprseConnect utoencoder gets the best denosng results, blncng nose nd shrpness. V. CONCLUSION hs wor proposes the concept of sprse connectons n utoencoders. Although there re severl studes on sprsely connected neurl networs, there s no pror study on sprsely connected utoencoder. hs s the frst wor n tht respect. All pror studes n sprse utoencoders concentrte on the problem of sprse ctvty. he motvton s drwn from the success of DropOut nd DropConnect neurl networs where over-fttng ws prevented by rndomly swtchng off some ctvtons or connectons durng trnng. Insted of usng such stochstc regulrzton technque, our proposed method determnstclly lerns the sprse connectons. It eeps the relevnt connectons nd prunes the unmportnt ones. hs

8 s cheved by ntroducng sprsty promotng regulrzton penltes on the utoencoder weghts. Experments were crred out for two benchmr utoencoder tss clssfcton nd denosng. or clssfcton, comprson s mde wth the SAE, CAE, DBN, SDAE, sprse deep neurl networ (SDNN) nd optml brn dmge (OBD). Under fr comprson (when non-prmetrc clssfers re used) our method outperforms others. Even wth fne-tuned neurl networ rchtectures, our proposed pproch yelds better results thn SDNN nd OBD nd performs t pr wth densely connected networs le SDAE nd DBN. or denosng, we compred gnst the denosng utoencoder nd the sprse utoencoder [3]. Even for ths ts our proposed pproch yelds consderbly better results. REERENCES [1] Y. LeCun, Optml Brn Dmge, Advnces n Neurl Informton Processng Systems, [] M. hom nd G. Plm, Sprse Actvty nd Sprse Connectvty n Supervsed Lernng, Journl of Mchne Lernng Reserch, Vol. 14, pp , 013. [3] V. Grpon, Sprse Neurl Networs th Lrge Lernng Dversty, IEEE rnsctons on Neurl Networs nd Lernng Systems, Vol. (7), pp , 011. [4] X. Glorot, A. Bordes nd Y. Bengo, Deep Sprse Rectfer Neurl Networs, AISAS 011. [5] L. oth, Phone Recognton wth Deep Sprse Rectfer Neurl Networs, ICASSP 013. [6] D. Yu,. Sede, G. L nd L. Deng, Explotng Sprseness n Deep Neurl Networs for Lrge Vocbulry Speech Recognton, ICASSP 01. [7] B. Lu, M. ng, H. oroosh, M. ppen nd M. Pensy, Sprse Convolutonl Neurl Networs, CVPR 015. [8] H. Awno, S. Nshde, H. Are, J. n,. hsh, H. G. Ouno nd. Ogt, Use of Sprse Structure to Improve Lernng Performnce of Recurrent Neurl Networs, Neurl Informton Processng, pp , Lecture Notes n Computer Scence. [9] Andrew Ng, "Sprse Autoencoder", CS94A Lecture notes, vol. 7, 011 [10] A. Mhn nd B. rey, "K-sprse Autoencoder", ICLR 014. [11] K. H. Cho, "Smple Sprsfcton Improves Sprse Denosng Autoencoders n Denosng Hghly Nosy Imges", ICML 013. [1] N. Jng,. Rong, B. Peng, Y. Ne nd Z. Xong An emprcl nlyss of dfferent sprse penltes for utoencoder n unsupervsed feture lernng, IJCNN 015. [13] Y. Lu, L. Zhng, B. ng nd J. Yng, eture ensemble lernng bsed on sprse utoencoders for mge clssfcton, IJCNN 014. [14] Y. Bengo, Lernng deep rchtectures for AI, oundtons nd rends n Mchne Lernng, (1), [15] P. Vncent, H. Lrochelle, I. Ljoe, Y. Bengo nd P. -A. Mnzgol, Stced denosng utoen coders: Lernng useful representtons n deep networ wth locl denosng crteron, Journl of Mchne Lernng Reserch, Vol. 11, , 010. [16] M. Chen, K. enberger,. Sh, Y. Bengo, Mrgnlzed Denosng Autoencoders for Nonlner Representton, ICML 014. [17] S Rf, P Vncent, X Muller, X Glorot, Y Bengo, Contrctve uto-encoders: Explct nvrnce durng feture extrcton, ICML 011. [18] H. M. Abbs, Anlyss nd prunng of nonlner utossocton networs, IEE Proceedngs on Vson, Imge nd Sgnl Processng, Vol. 151 (1), pp , 004. [19] D. Donoho, Compressed sensng. (IEEE rns. on Informton heory, 5(4), pp , Aprl 006) [0] E. Cndès nd. o, Ner optml sgnl recovery from rndom projectons: Unversl encodng strteges? (IEEE rns. on Informton heory, 5(1), pp , December 006) [1] B. K. Ntrjn, "Sprse pproxmte solutons to lner systems", SIAM Journl on Computng, 4(1995), 7-34 [] Sprse Sgnl Restorton: cnx.org/content/m3168/ltest/ [3] Y. C. Pt, R. Rezfr nd P. S. Krshnprsd, "Orthogonl mtchng pursut: recursve functon pproxmton wth pplctons to wvelet decomposton", Aslomr Conference on Sgnls, Systems nd Computers, pp.40-44, [4] J. A. ropp nd A. C. Glbert, "Sgnl Recovery rom Rndom Mesurements V Orthogonl Mtchng Pursut", IEEE rnsctons on Informton heory, Vol. 53 (1), pp , 007. [5] homs Blumensth nd Me E. Dves, "Itertve hresholdng for Sprse Approxmtons", Journl of ourer Anlyss Applctons, Vol. 14 (5), , 008. [6] [7] H. Lrochelle nd Y. Bengo, "Clssfcton usng Dscrmntve Restrcted Boltzmnn Mchnes", Interntonl Conference on Mchne Lernng, 008. [8] G. znets nd P. Coo, "Muscl genre clssfcton of udo sgnls", IEEE rnsctons on Audo nd Speech Processng 00. [9] [30] [31] J. rght, A. Y. Yng, A. Gnesh, S. S. Sstry nd Y. M, Robust fce recognton v sprse representton, IEEE rnsctons on Pttern Anlyss nd Mchne Intellgence, 31(), 10-7, 009. [3] [33] deep-neurl-networ [34] Z. ng, A. C. Bov, H. R. Sheh nd E. P. Smoncell, "Imge qulty ssessment: rom error vsblty to structurl smlrty," IEEE rnsctons on Imge Processng, vol. 13, no. 4, pp , Apr. 004.

Principle Component Analysis

Principle Component Analysis Prncple Component Anlyss Jng Go SUNY Bufflo Why Dmensonlty Reducton? We hve too mny dmensons o reson bout or obtn nsghts from o vsulze oo much nose n the dt Need to reduce them to smller set of fctors