Face Recognition CS 663

Size: px

Start display at page:

Download "Face Recognition CS 663"

Dayna Warner
6 years ago
Views:

1 Face Recognton CS 663

2 Importance of face recognton The most common way for humans to recognze each other Study of the process of face recognton has applcatons n () securty/survellance/authentcaton, () understandng of vsual psychology, (3) automated taggng on Facebook

3 Face recognton: problem statement Gven a database of face mages of people, and a new test mage, answer the followng queston: The test mage contans the face of whch ndvdual from the database?

5 A naïve method Compare the test mage wth each database mage n terms of SSD (sum of squared dfferences). Choose the closest match (.e., n terms of squared dfference)! I J j ( I j J j ) Ths method s fraught wth problems!

6 Challenges n face recognton: detecton Where s (are) the face(s) n the pcture?

Challenges n face recognton: pose varaton http://www4.comp.polyu.edu.hk/~bometrcs/polyudb_face.fles/mage00.

7 Challenges n face recognton: pose varaton Images are D, face s a 3D object. There wll be out of plane rotatons (profle versus front), change n apparent sze due to change n dstance from camera.

Challenges n face recognton: llumnaton varaton http://www.uwe.ac.uk/et/mages/raw_v_varaton_.

8 Challenges n face recognton: llumnaton varaton Multple lght sources Change n drecton of lghtng sources Change n ntensty/color/type of lght source Shadows, specular reflectons

9 Challenges n face recognton: epresson varaton Vared epressons: smle, anger, frown, sadness, surprse, closed eyes, confuson, etc.

10 Challenges n face recognton: age varaton

Challenges n face recognton: varaton of facal accessores http://gps-tsc.upc.

11 Challenges n face recognton: varaton of facal accessores Spectacles, beard and moustache, scarves, ear-rngs, change of har-styles, etc.

Challenges n face recognton: carcatures/pantngs http://www.

12 Challenges n face recognton: carcatures/pantngs

13 Challenges n face recognton: blur/nose/scanner artfacts

14 And more! Even gnorng changes of pose, llumnaton, epresson, etc., we tend to look dfferent at dfferent perods of tme! Recognton stll remans a challenge!

15 Face Recognton system: block dagram () Collect database of face mages of people. Record one or multple mages per person (called gallery mage(s) of the person) (4) Label the features etracted from the mage wth that person s dentty, and store n a database () ormalze the mages: (manually) crop out the face from the overall mage background, correct for pose varaton or lghtng changes TRAIIG PHASE! (3) Etract relevant features from the normalzed face mage (more on ths later!)

16 Face Recognton system: block dagram () Collect an mage of the person whose dentty s to be determned* called the probe mage. In most cases the tme gap between acquston of probe and gallery mages s sgnfcant (months/years) () ormalze the probe mage: (manually) crop out the face from the overall mage background, correct for pose varaton or lghtng changes (3) Etract the same features from the normalzed face mage (more on ths later!) as from the gallery mages (4) Fnd the gallery mage whose features most closely match (nearest neghbor search) those of the probe mage. That tells you the dentty. TESTIG PHASE! *For now, we assume that the person whose dentty was to be determned, has mages recorded n the gallery database

17 Problems related to face recognton Face verfcaton: gven two face mages, determne whether they belong to the same ndvdual (wthout concern for the ndvdual s dentty)..

18 Problems related to face recognton Ethncty/gender dentfcaton from face mages Is the gven face mage a photo or s t a pantng/carcature?

19 What features to etract for face recognton? Many methods () Detect vsble features: eyes, nose, mouth, hgh-ponts of the cheek, chn, eyebrows, etc. ot very robust! () Statstcal holstc approaches: etract features usng statstcal method. These features may not necessarly have a physcal nterpretaton (n terms of, say, eyes, nose, mouth, etc.)

20 Egenfaces! We focus on the latter group of technques n these lectures. One such technque for face recognton called Egenfaces uses a statstcal method called Prncpal Components Analyss (PCA).

21 Prncpal Components Analyss (PCA) Consder vectors (or ponts), each contanng d elements, each represented as a column vector. d We say:, R. d could be very large lke 50,000 or more. Our am s to etract some k features from each, k << d. Effectvely we are projectng the orgnal vectors from a d-dmensonal space to a k-dmensonal space. Ths s called dmensonalty reducton. PCA s one method of dmensonalty reducton.

22 PCA How do we pck the k approprate features? We look nto a noton of compressblty how much can the data be compressed, allowng for some small errors.

23 PCA: Algorthm. Compute the mean of the gven ponts:. Deduct the mean from each pont: 3. Compute the covarance matr of these mean-deducted ponts: d d R R,, postve- semdefnte and t s a symmetrcmatr, s : :, C C ) )( ( C ote R ote d d T T

24 PCA: algorthm 4. Fnd the egenvectors of C: CV VΛ, V R dd, Λ R dd V matr of egenvecto rs (each column s an egenvecto r), Λ dagonal matr of egenvalue s ote : V s an orthonormal matr (.e. VV V V I), as C s a covarance and hence t s symmetrc. ote : Λ contans non - negatve values on the dagonal (egen - values) T T matr 5. Etract the k egenvectors correspondng to the k largest egenvalues. Ths s called the etracted egenspace: Vˆ k V(:,: k) There s an mplct assumpton here that the frst k ndces ndeed correspond to the k largest egenvalues. If that s not true, you would need to pck the approprate ndces.

25 PCA: algorthm 6. Project each pont onto the egenspace, gvng a vector of k egen-coeffcents for that pont. α As V s k Vˆ Vα Vˆ α k k T k, α k R orthonormal, V(:,) α Vˆ k (:,) α k k ; α V we have () V(:,) α () Vˆ k T, α (:,) α R ()... V(:, d) α k d ()... Vˆ k ( d) (:, d) α k ( k) We are representng each face as a lnear combnaton of the k egenvectors correspondng to the k largest egenvalues. The coeffcents of the lnear combnaton are the egen-coeffcents. ote that α k s a vector of the egencoeffcents of the -th sample pont, and t has k elements. The j-th element of ths vector s denoted as α k (j).

26 PCA and Face Recognton: Egen-faces Consder a database of cropped, frontal face mages (whch we wll assume are algned and under the same llumnaton). These are the gallery mages. We wll reshape each such mage (a D array of sze H W after croppng) to form a column vector of d = HW elements. Each mage wll be a vector, as per the notaton on the prevous two sldes. And then carry out the s steps mentoned before. The egenvectors that we get n ths case are called egenfaces. Each egenvector has d elements. If you reshape those egenvectors to form mages of sze H W, those mages look lke (fltered!) faces.

27 Eample A face database jh78_caj65/

28 Top 5 Egen-faces for ths database! jh78_caj65/

29 PCA and Face recognton: Egenfaces For each gallery mage, you compute the egen-coeffcents. You then store the egencoeffcents and the dentty of the person n a database. You also store n the database. Durng the testng phase, you are gven a probe mage (say) z p n the form of a column vector of HW elements. You deduct the mean mage from z p : z p z p Vˆ k,

30 PCA and Face recognton: Egenfaces You then project the mean-deducted face mage onto the egen-space: α p Vˆ T k z p Egen-coeffcents of the probe mage z p. ow, compare α p wth all the α k (egen-coeffcents of the gallery mages) n the database. Fnd the closest match n terms of the squared dstance between the egen-coeffcents. That gves you the dentty (see net slde).

31 PCA and Face recognton: Egenfaces jp arg mn l α p α l Egen-coeffcents of the probe mage z p. Egen-coeffcents of the l-th gallery mage l. ote: other dstance measures (dfferent from sum of squared dfferences) may also be employed. One eample s sum of absolute dfferences, gven as follows: α p α l Another could be normalzed dot product (and ths dstance measure should be mamzed!): α p αl α p α l

32 PCA and Face recognton: egenfaces The egen-face mages contan more and more hgh frequency nformaton as the correspondng egen-values decrease. Although PCA s a technque known for a long tme, t s applcaton n face recognton was poneered by Turk and Pentland n a classc paper n 99. M. Turk and A. Pentland (99). Egenfaces for recognton, Journal of Cogntve euroscence, 3(): 7 86.

34 PCA and Face recognton: egenfaces We can regard the k egenfaces as key sgnatures. We epress each face mage as a lnear combnaton of these egenfaces,.e. the average face + (say) 3 tmes egenface + (say) 5 tmes egenface + (say) - tmes egenface 3 and so on. (note: 3,5,- here are the egencoeffcents, and some of them can be negatve). Vα Vˆ α k k V(:,) α Vˆ k (:,) α k () () V(:,) α Vˆ k (:,) α ()... k ()... V(:, d) α Vˆ k ( d) (:, d) α k ( k)

35 One word of cauton: Egen-faces The algorthm descrbed earler s computatonally nfeasble for egen-faces, as t requres storage of a d d Covarance matr (d the number of mage pels - could be more than 0,000). And the computaton of the egenvectors of such a matr s a O(d 3 ) operaton! We wll study a modfcaton to ths that wll brng down the computatonal cost drastcally.

36 Egen-faces: reducng computatonal complety. Consder the covarance matr: C ote : C s a symmetrcmatr, T ( )( and t s ), ote : C R postve- semdefn te It wll requre too much memory f d s large, and computng ts egenvectors wll be a horrendous task! Consder the case when s much less than d. Ths s very common n face recognton applcatons. The number of tranng mages s usually much smaller than the sze of the mage. T dd

37 Egen-faces: reducng computatonal complety. In such a case, the rank of C s at the most -. So C wll have at the most - non-zero egenvalues. We can wrte C n the followng way: C X [ T... XX ] R T d, where

38 Back to Egen-faces: reducng computatonal complety. Consder the matr X T X (sze ) nstead of XX T (sze d d). Its egenvectors are of the form: X T Xw w, w R XX T ( Xw) ( Xw)[pre multplyng by X] Xw s an egenvector of C=XX T! Computng all egenvectors of C wll now have a complety of only O( 3 ) for computaton of the egenvectors of X T X + O( d) for computaton of Xw from each w = total of O( 3 + d ) whch s much less than O(d 3 ). ote that C has at most only mn(-,d) egenvectors correspondng to non-zero egen-values (why?).

39 Egenfaces: Algorthm ( << d case). Compute the mean of the gven ponts:. Deduct the mean from each pont: 3. Compute the followng matr: postve- semdefn te and t s s a symmetrcmatr, : ] [,, L... X L X X L ote R R d T d d R R,,

40 Egen-faces: Algorthm ( << d case) 4. Fnd the egenvectors of L: LW WΓ, W egenvecto rs, Γ egenvalues, T WW I 5. Obtan the egenvectors of C from those of L: d V XW, XR, WR, VR 6. Unt-normalze the columns of V. 7. C wll have at most only egenvectors correspondng to non-zero egen-values*. Out of these you pck the top k (k < ) correspondng to the largest egen-values. * Actually ths number s at most - ths s due to the mean subtracton, else t would have been at most.

Eample A face database http://people.ece.cornell.

41 Eample A face database jh78_caj65/

Top 5 Egen-faces for ths database! http://people.ece.cornell.

42 Top 5 Egen-faces for ths database! jh78_caj65/

43 Eample The Yale Face database

44 Top 5 egenfaces from the prevous database Reconstructon of a face mage usng the top,8,6,3,,04 egenfaces (.e. k vared from to 04 n steps of 8) Vα ˆ k l Vˆ (:, l) α k ( l)

45 What f both and d are large? Ths can happen, for eample, f you wanted to buld an egenspace for face mages of all people n Mumba. Dvde people nto coherent groups based on some vsual attrbutes (eg: gender, age group etc) and buld separate egenspaces for each group.

46 PCA: A closer look PCA has many applcatons apart from face/object recognton n mage processng/computer vson, statstcs, econometrcs, fnance, agrculture, and you name t! Why PCA? What s specal about PCA? See the net sldes!

47 PCA: what does t do? It fnds k perpendcular drectons (all passng through the mean vector) such that the orgnal data are appromated as accurately as possble when projected onto these k drectons. We wll see soon why these k drectons are egenvectors of the covarance matr of the data!

48 PCA Look at ths scatter-plot of ponts n D. The ponts are hghly spread out n the drecton of the lght blue lne.

49 PCA Ths s how the data would look f they were rotated n such a way that the major as of the ellpse (the lght blue lne) now concded wth the Y as. As the spread of the X coordnates s now relatvely nsgnfcant (observe the aes!), we can appromate the rotated data ponts by ther projectons onto the Y-as (.e. ther Y coordnates alone!). Ths was not possble pror to rotaton!

50 PCA As we could gnore the X-coordnates of the ponts post rotaton and represent them just by the Y-coordnates, we have performed some sort of lossy data compresson or dmensonalty reducton. The job of PCA s to perform such a rotaton as shown on the prevous two sldes!

51 PCA Am of PCA: Fnd the lne passng through the sample mean (.e. ), such that the projecton of any mean-deducted pont onto e, most accurately appromates t. Projecton of - ontoe s T a e( - ) e a e, a R, Error of appromaton (ae)( ote: Here e s a unt vector. ) a e

52 PCA Summng up over all ponts, we get: a J ) ( ) ( ) ( appromaton totalerror of Sum e e ) ( e T a a ) ( ) ( e e T a a ) ( ) ( e e T a a T a a a a )) (,( e

53 PCA a J ) ( appromaton totalerror of Sum e Ths term s proportonal to the varance of the data ponts when projected onto the drecton e. t t ) )( ( e e t )) ( ( e ) ) ( (where C S Se e t

54 J ( e) t e Se PCA Mnmzng J( e) w.r.t. e s equvalent to mamzng Independent of the drecton e t e Se w.r.t. e. We use the method of Lagrange multplers todo so, whle smultaneously t mposng the constrant that e e. See append for detals So we have to take the dervatve of the followng modfed functon w.r.t. e (and set t to0) ~ t t J ( e) e Se ( e e -) ~ Takng dervatve of J ( e) w.r.t. e and settng t to0, we get Se e, soe s an egen - vector of S. t t Ase Se and we wsh to mamze e Se, wechoose e to be the egen - vector correspondng to the mamum egenvalue of S.

55 PCA PCA thus projects the data onto that drecton that mnmzes the total squared dfference between the data-ponts and ther respectve projectons along that drecton. Ths equvalently yelds the drecton along whch the spread (or varance) wll be mamum. Why? ote that the egenvalue of a covarance matr tells you the varance of the data when projected along that partcular egenvector: Se e t e Se t e Se t ( e ( )) Ths term s proportonal to the varance of the data when projected along e.

56 PCA But for most applcatons (ncludng face recognton), just a sngle drecton s absolutely nsuffcent! We wll need to project the data (from the hghdmensonal,.e. d-dmensonal space) onto k (k << d) dfferent mutually perpendcular drectons. What s the crteron for dervng these drectons? We seek those k drectons for whch the total reconstructon error of all the mages when projected on those drectons s mnmzed.

57 PCA We seek those k drectons for whch the total reconstructon error of all the mages when projected on those drectons s mnmzed. k t J ({ ej} j ) ( ) (ej( )) e j k j One can prove that these k drectons wll be the egenvectors of the S matr (equvalently covarance matr of the data) correspondng to the k-largest egenvalues. These k drectons form the egen-space. If the egenvalues of S are dstnct, these k drectons are defned unquely (up to a sgn factor)

PCA One can prove that these k drectons wll be the egenvectors of the S matr (equvalently covarance matr of the data) correspondng to the k-largest egenvalues. These k drectons form the egen-space.

58 PCA One can prove that these k drectons wll be the egenvectors of the S matr (equvalently covarance matr of the data) correspondng to the k-largest egenvalues. These k drectons form the egen-space. Sketch of the proof: Assume we have found e and are lookng for e (where e s perpendcular to e and e has unt magntude). Wrte out the objectve functon wth the two constrants. Mnmze t and do some algebra to see that e s the egenvector of S wth the second largest egenvalue. Proceed smlarly for other drectons.

59 How to pck k n an actual applcaton? Tral and error. Usually between 50 to 00 for mages of sze Dvde your tranng set nto two parts A and B (B s usually called the valdaton set). Pck the value of k that gves the best recognton rate on B when you tran on A. Stck to that value of k. ote: a larger k mples a better reconstructon but t may even cause a decrease n the recognton accuracy!

60 How to pck k n an actual applcaton? ote: a larger k mples a better reconstructon but t may even cause a decrease n the recognton accuracy! Why because throwng out some of the egenvectors may lead to flterng of the data, removng some unnecessary artfacts for eample.

61 Some observatons about PCA for face mages The matr V (wth all columns) s orthonormal. Hence the squared error between any mage and ts appromaton usng just top k egenvectors s gven by: ~ α α ~ d α jk V ( α ( why?) ( why?) α ~ ) ( why?) Ths error s small on an average for a well-algned group of face mages we wll see why on the net slde.

62 The egenvalues of the covarance matr typcally decay fast n value (f the faces were properly normalzed). ote that the j-th egenvalue s proportonal to the varance of the j-th egencoeffcent,.e. t e Se j j j e ( t j )( t ) e j ( ) E( ) What ths means s that the data have low varance when projected along most of the egenvectors,.e. effectvely the data are concentrated n a lower-dmensonal subspace of the d-dmensonal space. j

63 Person-specfc egen-faces So far, we bult one egen-space for the whole database consstng of multple mages each of multple people. Alternatve approach: Construct one egen-space (.e. a set of some k egenvectors) for each of the M people. Assume we have multple gallery mages per person, possbly under dfferent poses, llumnaton condtons and epressons.

64 Person-specfc egen-space Let the egen-space for the r-th person be denoted as V (r) k and the correspondng mean (r) mage be denoted as. The egen-coeffcents of a probe mage onto the r-th egen-space are gven as: β (r) p V T (r) k (z p (r) ) For every r, determne the followng: d( r) z p r Vˆ (r) (r) k βp A measure of how well k egen-vectors from the egen-space for person r, are capable of reconstructng the probe mage. ote that k number of gallery mages for the r-th person (why?)

65 Person-specfc egen-space An alternatve dstance measure s: d( r) β (r) (r) β p Average egen-coeffcent vector of all gallery mages belongng to the r-th person The dentty of the probe mage s gven by the egen-space r for whch d(r) was the mnmum.

66 Face recognton under vared lghtng The appearance varaton over mages of the same person under dfferent llumnatons s much greater than the varaton over mages of dfferent people under the same lghtng condton!

67 Face recognton under vared lghtng Gven a database of mages of M people each under L lghtng condtons, create an egenspace and remove the top 3 prncpal components for better face recognton! Why we wll understand the real reason n a computer vson class!

68 A word of clarfcaton PCA magc. It has no n-bult ablty to perform pose normalzaton. For pose normalzaton, you can buld pose-specfc egenspaces or person-specfc egen-spaces usng multple mages of a person under dfferent poses. For (partal) llumnaton normalzaton, you can use the trck mentoned on the prevous slde. However, n general pose normalzaton s a major problem n face recognton from D mages.

69 Another word of clarfcaton PCA at ts core s a reconstructon algorthm. It s not a classfcaton algorthm and s not desgned to be one. But t showed very good results for face recognton and hence became popular n ths communty. There are other methods (eg: Lnear Dscrmnant Analyss) whch are desgned for the purpose of good classfcaton on face mages or other datasets.

70 PCA: Compresson of a set of mages Consder a database of mages that are smlar (eg: all are face mages, all are car mages, etc.) Buld an egen-space from some subset of these mages (could be all mages, as well) We know that these mages can often be reconstructed very well (.e. wth low error) usng just a few egenvectors.

71 PCA: Compresson of a set of mages Use ths fact for mage compresson. Orgnal data storage = d pels mages = d bytes (assume one byte per pel ntensty) = 8d bts. After PCA: k number of bts to store each egencoeffcent = 3k bts (remember k << d, eample: d ~ 50,000 and k ~ 00). Plus storage of egenvectors = 3dk bts (remember k << M as well). Plus mean mage = 8d bts. Total: 3(+d)k + 8d bts

72 PCA: Compresson of a set of mages Eample: = 5000, d = 50000, k = 00 Orgnal sze/(sze after PCA compresson) ~.. ote: we allocated 3 bts for every element of the egen-vector. Ths s actually very conservatve and you can have further savngs usng several trcks.

73 PCA: Compresson of a set of mages Ths dffers a lot from JPEG compresson. JPEG uses dscrete cosne transform (DCT) as the bass. PCA tunes the bass to the underlyng tranng data! If you change the tranng data, the bass changes! The performance of the PCA compresson algorthm wll depend on how compactly the egen-space can represent the data. We wll study mage compresson usng JPEG and other standards later on n the course.

74 Face Recognton: Other types of mages From range data called as 3D face recognton ( )

75 Face Recognton: Other types of From vdeo! mages

76 Face recognton: from the FBI

77 Concluson We studed: Face recognton and related problem statements Egenfaces algorthm faster and slower versons Dervaton of the algorthm Modfcatons for face detecton/verfcaton/person-specfc egenfaces Applcaton for compresson of mages

78 References Secton 3.8 of Pattern Classfcaton by Duda and Hart M. Turk and A. Pentland (99). "Egenfaces for recognton". Journal of Cogntve euroscence

79 Append: Covarance matr In probablty and statstcs, the epected value of a scalar random varable s called ts mean: E( ) Probablty densty functon of p( ) d The varance of a scalar random varable s the epected value of ts squared devaton around the mean: E(( ) ) ( ) p( ) d ( ) Sample values of the random varable

80 Append: Covarance matr μ k The epected value of a vector random varable s called ts mean vector: E( ), ( (), k, k (),..., d ( d)), μ k-th sample value of ( (), (),..., ( d))

81 Append: Covarance matr The varance s now replaced by a covarance matr. Each entry contans the covarance between two elements of the vector: vector element of ) ( :, )), ( ) ( ))( ( ) ( ( ))) ( ) ( ))( ( ) ( (( ) )) ( ) ( ((... ))) ( ) ( ())( () (( ) ()) () (( ())) () ())( () (( ())) () ))( ( ) ( ((.. ())) () ())( () (( ) ()) () (( th k k ote d l d k l l k k l l k k E C R d d E d d E E E d d E E E kl d d C

82 Append: A word about Lagrange Multplers Consder you want to fnd the mnmum or mamum of f(,y). ormally, you take the dervatve and set t to 0 and check the sgn of the second dervatve. ow you want to fnd the optmum subject to a constrant that h(,y) = 0.

83 Append: A word about Lagrange Multplers Physcal analogy: f you drop a pebble nto a parabola-shaped bowl, the pebble wll fall to the bottom-most pont. But f you placed a wooden plank nto the bowl, the pebble wll settle somewhere on the plank!

84 c * c c c3 c4 c5 ange-multpler.pdf Suppose we are walkng from (,y ) to (,y ) along the constrant curve h(,y) = 0. Intally, the tangent vector along h(,y) = 0 has a component along grad(f), and hence there s a decrease n the value of f(,y) as we move along h(,y) = 0. If the pont (*,y*) s a local mnmum of f(,y), a small moton along h(,y) = 0 from (*,y*) wll have no component along grad(f), else t would cause an ncrease n the value of f. Hence the tangent to h(,y) = 0 at (*,y*) wll be perpendcular to grad(f). As we move further, moton along h(,y) = 0 wll have a component along +grad(f) leadng to an ncrease n the value of f(,y). At the mnmum pont, we have grad(f) perpendcular to the tangent,.e. grad(f) and grad(h) are collnear. Hence, there ests some value λ (the Lagrange multpler ) such that we have: f ( *, y*) h( *, y*) ( f ( *, y*) h( *, y*)) 0

85 If (',y') s the etremum of small perturbaton to(',y') wll lead to no change of f - n Hence we have f( ', y' ) But here we are constraned to move only along the curve h(,y) Let s be the tangent to ths curve at a pont (', y'). The magntude f( ', y' ) s. If (', y') s an etremum of f( ', y' ) s the lmt when of 0. the change functon the magntude 0. along But thenormal to thecurve s h( ', y' ) and t s perpendcular to s. Hence we have f( ', y' ) s collnear ths curve, f( ', y' ) h( ', y' ) for some value. f n f f(,y), then a of n the perturbaton s 0. due to nfntesmal wth h( ', y' ). the value movement along then we must have 0. thecurve s gven by In our dervaton for PCA, nstead of t we have e e 0. otethat e s a vector n d t f(,y) we have e Se, and nstead of dmensonal space. h(, y) 0,

Which Separator? Spring 1

Which Separator? Spring 1 Whch Separator? 6.034 - Sprng 1 Whch Separator? Mamze the margn to closest ponts 6.034 - Sprng Whch Separator? Mamze the margn to closest ponts 6.034 - Sprng 3 Margn of a pont " # y (w $ + b) proportonal