A linear imaging system with white additive Gaussian noise on the observed data is modeled as follows:

Supplementary Note Mathematcal bacground A lnear magng system wth whte addtve Gaussan nose on the observed data s modeled as follows: X = R ϕ V + G, () where X R are the expermental, two-dmensonal proecton data,, K, =, of =, K, pxels each; V, wth =, K, K, s one of K underlyng three-dmensonal structures; R ϕ are proecton operatons, where ϕ defnes the poston of the threedmensonal structure wth respect to proecton mage X (as descrbed by three Euler angles and two n-plane translatons); and G R zero-mean Gaussan nose wth standard devaton σ. are mages contanng ndependent We optmze the logarthm of the lelhood of observng the entre data set, gven a model wth parameter set Θ : ( Θ) = ( Θ) = ln, () X where ( Θ) X s the probablty of observng data mage X gven all current model parameters. We treat the assgnments to underlyng 3D-structures and postons ϕ as hdden varables. Expressng ( Θ) can be rewrtten as: as an ntegral over all possble assgnments, Eq. X

K ( Θ) = ln ( X,, Θ) (, ϕ Θ) = = ϕ ϕ dϕ (3) where ( X, ϕ,θ) s the probablty of observng data mage X gven, ϕ and Θ ; and (,ϕ Θ) s the probablty densty functon of and ϕ. n (3), we dscretze the ntegral over ϕ by a summaton over p =,..., dstnct proecton drectons (descrbed by combnatons of two Euler angles) and a summaton over q =,..., Q n-plane transformatons (consstng of Q rot dscrete nplane rotatons and Q trans translatons). The log-lelhood functon as defned n Eq. 3 can then be wrtten as: K Q ( Θ) = ( X, Θ) (, ln, (4) = = p= q= Furthermore, we model the 3D-structure V as a summaton over bass functons b l wth coeffcents c l : V l= c l b l. et B pl denote the -dmensonal column vector that s the proecton of b l n drecton p. Then t s easy to see that the proecton of c l b l s c l B pl = l= B p c, where B p denotes the matrx whose l th column s B pl. We use the notaton X to ndcate that X s mapped onto a transformed coordnate system, where q defnes the (n-plane) transformaton. l= Gven, p and we assume that the probablty densty functon of each pxel value n mage X can be descrbed by a Gaussan functon, centered at the value of the correspondng pxel n the p th proecton of the th underlyng 3D structure. The wdth of these Gaussans s equal to the estmated standard devaton σ n the expermental nose. Assumng ndependence among all pxels n mage X,.e. assumng whte, or ndependent nose, ( X, Θ) Gaussan functons: can be expressed as a multplcaton over

( X, Θ) ([ B pc ] [ X ] ) = = exp πσ σ πσ exp X p = B c σ (5) where [ X ] denotes the value of the th pxel of an mage X, and X = [ X ] denotes ts squared norm. = We defne (,,.e. the probablty densty functon of, p and as follows. We assume that partcle pcng has left a two-dmensonal Gaussan dstrbuton of resdual translatons q x and q y, centered at the orgn and wth a standard devaton of ξ pxels. Furthermore, we assume an even dstrbuton for all Q rot n-plane rotatons, and a dscretzed dstrbuton of estmated proportons α p of the data belongng to the p th proecton of the th underlyng 3D-structure (wth α 0 and α = ). Thereby, (, s defned by: p K = p= p α p qx + q y (, = exp (6) Qrot πξ ξ The log-lelhood target functon, as defned by Eqs. 4-6, s optmzed usng an Expectaton-Maxmzaton (EM) algorthm. n the E-step of ths algorthm a lower bound Z to the lelhood s bult, based on the current estmates for the model parameters (n) Θ, n beng the teraton number: Z K Q ( Θ Θ ) = (, q X, Θ ) ln ( X, Θ) (, = = p= q= { },. (7)

Here, (, q, ) Θ s the probablty of, p and gven mage X and the current X estmates for the model. Accordng to Bayes theorem ths probablty can be wrtten as: ( q X, Θ ) ( X, Θ ) (, q Θ ) K Q ( X, Θ ) (, q, = (8) = p= q= Θ ) n the subsequent M-step of the EM-algorthm we maxmze the lower bound to obtan a parameter set for the next teraton ( ( +) Θ n ). Recaptulatng, Θ conssts of an estmate σ for the standard devaton n the expermental nose, an estmate ξ for the standard devaton n the orgn offsets, of classes and proecton drectons and K estmates α p for the probablty densty functons K estmates for coeffcents c l of the dfferent 3D obects. Settng the partal dervatve to the lower bound wth respect to σ to zero, and solvng for σ yelds: K Q (, q X, Θ ) ( [ B pc ] [ X ] ) ( ) σ = (9) = = p= q= = Smlarly, the updated estmate for the standard devaton n the orgn offsets s obtaned by: (, q X, Θ )( q + q ) K Q ( ) ξ = x y (0) = = p= q= For maxmzaton of the lower bound wth respect toα, we ntroduce a agrange K multpler to constran α =, yeldng the followng soluton: = p= p p ( ) p Q = q= (, q X, Θ ) α = ()

To obtan updated estmates for the 3D-reference structures, we set partal dervatves to coeffcents c l to zero. Droppng constants, we get for all l and all : Z ( Θ, Θ ) = c l Q ( q X, Θ ) ( [ B pc ] [ X ] )[ B pl ], = 0. () = p= q= = Defnng: A p Q = q= = Q = q= (, q X, Θ ) X (, q X, Θ ), (3) w = Q ( q, Θ ),, (4) p X = q= we rewrte Eq. (for all and l) as: wp ( [ B pc ] [ Ap ] )[ B pl ] p= = = 0, (5) whch are the normal equatons of the least-squares problem: mn w B c A (6) p= p p p snce settng partal dervatves of Eq. 6 wth respect to c l equal to zero yelds Eq. 5. Therefore, maxmzng the lelhood through solvng Eq. for all and l s equvalent to solvng the set of K weghted-least-squares problems n Eq. 6. For ths purpose, we apply the followng modfed algebrac reconstructon technque (ART; based on the methodology ntroduced by Eggermont, Herman and ent ) to all -dmensonal vectors c :

For the nner teraton m = 0,,..., M and p = + m mod( + ), whenever p,, ) ) T ) c = c + w B d, (7), ) p' ) p' p p r = r for p' p, (8) r = r + d, (9), ) ) ) p p where the correcton mage d ) R s calculated as: ) [ ] ) ) ([ Ap ] [ B pc ] ) [ rp ] wp d = λ for all =,...,. (0) T w B B + p p p For the specal case when p = +,, ) ) c = c, () r, ) p ) p' T ' = r κ w ' B w '' B '' r for all p '=,...,. () p p' p'' = p p ) p'' Here, κ and λ are relaxaton parameters, B p s the th row of B p, and r s a vector of auxlary mages rp (0, ) R, wth [ p ] = 0 r for all, and p. For the frst teraton of the expectaton maxmzaton algorthm (n = 0), we start the teratve reconstructon (0,) algorthm from all-zero maps ( c = 0, for all l and ); for all subsequent teratons we 0, ) start from the resultng map of the prevous teraton ( c = c l ( ( M, n) l l, for all l and ). After a user-provded number of nner teratons M for the modfed ART algorthm, ( ) ( M, ) new estmates for all K 3D-structures are obtaned as: c = c.

mplementaton Detals The proposed classfcaton approach entals a maxmum-lelhood multreference angular refnement algorthm that has been mplemented n the Mrefne3D program of the open-source pacage Xmpp, whch s freely avalable through http://xmpp.cnb.csc.es. For each teraton, the K reference maps are proected onto a lbrary of K reference mages, accordng to an even angular dstrbuton wth a user-provded samplng rate. Subsequently, all expermental mages are presented to the entre reference lbrary, evaluatng Eq. 8 to calculate weghts (, q, ) Θ for all Q n-plane transformatons. To lmt computatonal requrements, a prevously reported reduced search-space approach s appled to the ntegraton over all orgn offsets 3. n addton, once the proecton orentatons are nown wth relatve confdence, the user may opt to further reduce the ntegratons to local searches around the optmal angular and translatonal assgnments from the prevous teraton (mplctly assumng zero pror probabltes for assgnments outsde the regons of lmted ntegratons). After all relevant probablty weghts have been evaluated, new estmates for the standard devaton n the nose and for the probablty densty functons of the hdden parameters are obtaned accordng to Eqs. 9-, and updated estmates for the three-dmensonal structures are obtaned by performng K ndependent modfed ART-reconstructons accordng to Eqs. 7-. Then, the newly derved parameter set ( +) Θ n X s used for the next teraton of the expectaton maxmzaton algorthm, untl a user-defned maxmum number of teratons N has been reached. All lelhood calculatons presented n ths wor were performed usng the followng set of default parameters. By default, the ntal estmate for the standard devaton n the nose s set to, the ntal estmate for the standard devaton n the orgn offsets s set to 3 pxels, and a unform dstrbuton s used to ntalze all α p. For all 3D reconstructons, relaxaton parameters λ = 0. and κ = 0. 5 are used to perform M = 0 teratons of the modfed ART algorthm. For the bass functons of the reconstructed obects we use blobs as defned by Marabn et al. 4, wth the Xmpp default parameters of a =, m = and α = 0. 4.

Snce fast two-dmensonal Fourer transforms are employed for the ntegratons over multple translatons, the requred computng tmes are of order ( ) O log for pxels. Furthermore, the computaton tmes depend lnearly on the number of expermental mages, the number of mages n the reference lbrary, and the number of sampled n-plane rotatons. The rate-lmtng step s the evaluaton of probablty weghts (, q, ) Θ for all mages (Eq. 8), snce t mples an extensve search X over many n-plane transformatons and reference proectons. Fortunately, ths step can be parallelzed wth relatve ease, and the current mplementaton employs a message passng nterface (M) to perform these calculatons ndependently for multple subsets of the data. We dd not mplement a parallel verson of the modfed ART algorthm, as the K reconstructons are already ndependent and typcally requre two orders of magntude less CU tme than the calculaton of the probablty weghts. To further lmt the requred computng tmes, the current mplementaton stores the Fourer transforms of multple n-plane rotated versons of all reference proectons n memory. Smlarly, rotated versons of all (runnng) weghted sums for mages A p (Eq. 3) are also stored n memory. Ths speeds up the evaluaton of the probablty weghts, as only a lmted amount of mage rotatons needs to be performed for each of the expermental mages. However, ths also mples that the memory requrements ncrease rapdly wth ncreasng angular samplng rates. Whle a 0 samplng wth K = 4 and mages of 64 64 pxels requre approxmately 900 Mb, a 5 samplng would tae approxmately 8 tmes more, exceedng the lmts of our currently avalable hardware. Although obvously memory may be traded for computng or ds-access tme, we as yet dd not perform such computatonally demandng tests. Wth the current mplementaton, classfcaton of the large T-antgen data too one CU month (on an ntel Xeon 3GHz processor), performng 0 teratons of lelhood optmzaton wth exhaustve angular searches and an angular samplng rate of 5. For the rbosome data, 5 teratons of lelhood optmzaton wth an angular samplng rate of 0 too a total of 6 CU months. n ths case, the calculatons were consderably speeded up by local searches. Whle exhaustve searches requred on average 8 CU days per teraton durng the frst 7 teratons, local searches of +/- 0

n proecton drectons and +/- pxels n orgn offsets requred approxmately 3 CU days per teraton durng the remanng 8 teratons. References. Eggermont,.. B., Herman, G. T. & ent, A. teratve algorthms for large parttoned lnear systems, wth applcatons to mage reconstructon. near Algebra and ts Applcatons 40, 37-67 (98).. Sorzano, C. O. S. et al. XM: a new generaton of an open-source mage processng pacage for electron mcroscopy. Struct Bol 48, 94-04 (004). 3. Scheres, S. H. W., Valle, M. & Carazo,. M. Fast maxmum-lelhood refnement of electron mcroscopy mages. Bonformatcs Suppl, 43-44 (005). 4. Marabn, R., Herman, G. T. & Carazo,. M. 3D reconstructon n electron mcroscopy usng ART wth smooth sphercally symmetrc volume elements (blobs). Ultramcroscopy 7, 53-65 (998).