Learning Spherical Convolution for Fast Features from 360 Imagery

Larning Sphrical Convolution for Fast Faturs from 36 Imagry Anonymous Author(s) 3 4 5 6 7 8 9 3 4 5 6 7 8 9 3 4 5 6 7 8 9 3 3 3 33 34 35 In this fil w provid additional dtails to supplmnt th main papr submission. In particular, this documnt contains:. Figur illustration of th sphrical convolution ntwork structur. Implmntation dtails, in particular th larning procss 3. Data prparation procss of ach datast 4. Complt xprimnt rsults 5. Additional objct dtction rsult on Pascal, including both succss and failur cass 6. Complt visualization of th AlxNt conv krnl in sphrical convolution Sphrical Convolution Ntwork Structur Fig. shows how th proposd sphrical convolutional ntwork diffrs from an ordinary convolutional nural ntwork (CNN). In a CNN, ach krnl convolvs ovr th ntir D map to gnrat a D output. Altrnativly, it can b considrd as a nural ntwork with a tid wight constraint, whr th wights ar shard across all rows and columns. In contrast, sphrical convolution only tis th wights along ach row. It larns a krnl for ach row, and th krnl only convolvs along th row to gnrat D output. Also, th krnl siz may diffr at diffrnt rows and layrs, and it xpands nar th top and bottom of th imag. Additional Implmntation Dtails W train th ntwork using ADAM []. For pr-training, w us th batch siz of 56 and initializ th larning rat to.. For layrs without batch normalization, w train th krnl for 6, itrations and dcras th larning rat by vry 4, itrations. For layrs with batch normalization, w train for 4, itrations and dcras th larning rat vry, itrations. For fin-tuning, w first fin-tun th ntwork on conv3_3 for, itrations with batch siz of. Th larning rat is st to -5 and is dividd by aftr 6, itrations. W thn fin-tun th ntwork on conv5_3 for,48 itrations. Th larning rat is initializd to -4 and is dividd by aftr,4 itrations. W do not insrt batch normalization in conv_ to conv3_3 bcaus w mpirically find that it incrass th training rror. 3 Data Prparation This sction provids mor dtails about th datast splits and sampling procdurs. PanoVid For th PanoVid datast, w discard vidos with rsolution W H and sampl frams at.5fps. W us Mountain Climbing for tsting bcaus it contains th smallst numbr of frams. Not that th training data contains no instancs of Mountain Climbing, such that our ntwork is forcd to gnraliz across smantic contnt. W sampl at a low fram rat in ordr to rduc tmporal rdundancy in both training and tsting splits. For krnl-wis pr-training and tsting, w sampl th output on 4 pixls pr row uniformly to rduc spatial rdundancy. Our prliminary xprimnts show that a dnsr sampl for training dos not improv th prformanc. Submittd to 3st Confrnc on Nural Information Procssing Systms (NIPS 7). Do not distribut.

K l+ K l+. K l+ K l+ K l+ K l+. K l K l K l. θ φ Figur : Sphrical convolution illustration. Th krnl wights at diffrnt rows of th imag ar untid, and ach krnl convolvs ovr on row to gnrat D output. Th krnl siz also diffrs at diffrnt rows and layrs. 36 37 38 39 4 4 4 43 44 45 46 47 PASCAL VOC 7 As discussd in th main papr, w transform th D PASCAL imags into quirctangular projctd 36 data in ordr to tst objct dtction in omnidirctional data whil still bing abl to rly on an xisting ground truthd datast. For ach bounding box, w rsiz th imag so th short sid of th bounding box matchs th targt scal. Th imag is backprojctd to th unit sphr using P, whr th cntr of th bounding box lis on ˆn. Th unit sphr is unwrappd into quirctangular projction as th tst data. W rsiz th bounding box to thr targt scals {, 4, 336} corrsponding to {.5R,.R,.5R}, whr R is th Rf of N p. Each bounding box is projctd to 5 tangnt plans with φ = 8 and θ {36, 7, 8, 44, 8 }. By sampling th boxs across a rang of scals and tangnt plan angls, w systmatically tst th approach in ths varying conditions. 4 Complt Exprimntal Rsults This sction contains additional xprimntal rsults that do not fit in th main papr. conv RMSE 8 36 54 7 9 conv3 3 RMSE 8 36 54 7 9 conv4 3 RMSE 8 36 54 7 9 Figur : Ntwork output rror. conv5 3 RMSE 8 36 54 7 9 Dirct Intrp Prspctiv Exact OptSphConv SphConv-Pr SphConv 48 49 5 Fig. shows th rror of ach mta layr in th VGG architctur. This is th complt vrsion of Fig. 4a in th main papr. It bcoms mor clar to what xtnt th rror of SPHCONV incrass as w go dpr in th ntwork as wll as how th rror of INTERP dcrass.

IoU.4. Scal =.5R 8 36 54 7 9.4. Scal =.R 8 36 54 7 9.4. Scal =.5R 8 36 54 7 9 Figur 3: Proposal ntwork accuracy (IoU). Dirct Intrp Prspctiv Exact OptConv SphConv-Pr SphConv 5 5 53 54 55 56 57 58 59 6 6 6 Fig. 3 shows th proposal ntwork accuracy for all thr objct scals. This is th complt vrsion of Fig. 6b in th main papr. Th prformanc of all mthods improvs at largr objct scals, but PERSPECTIVE still prforms poorly nar th quator. 5 Additional Objct Dtction Exampls Figurs 4, 5 and 6 show xampl dtction rsults for SPHCONV-PRE on th 36 vrsion of PASCAL VOC 7. Not that th larg black aras ar undfind pixls; thy xist bcaus th original PASCAL tst imags ar not 36 data, and th contnt occupis only a portion of th viwing sphr. Fig. 7 shows xampls whr th proposal ntwork gnrat a tight bounding box whil th dtctor ntwork fails to prdict th corrct objct catgory. Whil th distortion is not as svr as som of th succss cass, it maks th confusing cass mor difficult. Fig. 8 shows xampls whr th proposal ntwork fails to gnrat tight bounding box. Th bounding box is th on with th bst intrsction ovr union (IoU), which is lss than.5 in both xampls. 3

Figur 4: Objct dtction rsults on PASCAL VOC 7 tst imags transformd to quirctangular projctd inputs at diffrnt polar angls θ. Black aras indicat rgions outsid of th narrow fild of viw (FOV) PASCAL imags, i.., undfind pixls. Th polar angl θ = 8, 36, 54, 7 from top to bottom. Our approach succssfully larns to translat a D objct dtctor traind on prspctiv imags to 36 inputs. 4

Figur 5: Objct dtction rsults on PASCAL VOC 7 tst imags transformd to quirctangular projctd inputs at θ = 36. 5

Figur 6: Objct dtction rsults on PASCAL VOC 7 tst imags transformd to quirctangular projctd inputs at θ = 8. 6

Figur 7: Failur cass of th dtctor ntwork. Figur 8: Failur cass of th proposal ntwork. 7

63 64 65 66 67 68 69 6 Visualizing Krnls in Sphrical Convolution Fig. 9 shows th targt krnls in th AlxNt [] modl and th corrsponding krnls larnd by our approach at diffrnt polar angls θ {9, 8, 36, 7 }. This is th complt list for Fig. 5 in th main papr. Hr w s how ach krnl strtchs according to th polar angl, and it is clar that som of th krnls in sphrical convolution hav largr wights than th original krnls. As discussd in th main papr, ths xampls ar for visualization only. As w show, th first layr is amnabl to an analytic solution, and only layrs l > ar larnd by our mthod. Figur 9: Larnd conv krnls in AlxNt (full). Each squar patch is an AlxNt krnl in prpsctiv projction. Th four rctangular krnls bsid it ar th krnls larnd in our ntwork to achiv th sam faturs whn applid to an quirctangular projction of th 36 viwing sphr. 7 7 7 73 Rfrncs [] D. Kingma and J. Ba. Adam: A mthod for stochastic optimization. arxiv prprint arxiv:4.698, 4. [] A. Krizhvsky, I. Sutskvr, and G. Hinton. Imagnt classification with dp convolutional nural ntworks. In NIPS,. 8