Recognition by generation. Devika Subramanian Comp 140 Fall 2008

Size: px

Start display at page:

Download "Recognition by generation. Devika Subramanian Comp 140 Fall 2008"

Cory Underwood
5 years ago
Views:

1 Recognition by generation Devika Subramanian Comp 140 Fall 2008

2 Four problems Authorship attribution Is this play written by Shakespeare? Is this Federalist Papers excerpt written by Hamilton? Face detection Is there a human face in this image? Digit recognition Is this handwritten digit a 2? Gene recognition Does this DNA sequence contain a gene? 2

3 A single approach Authorship attribution How closely does the new manuscript resemble anything in the collected works? New manuscript Collected works of Shakespeare 3

4 A single approach Face detection New image How closely does the new image resemble the face images? Database of images containing faces and non-faces 4

5 A single approach New digit How closely does this digit resemble 2? Database of handwritten digits 5

6 A single approach How closely does the new sequence resemble anything in the database of genes? A new DNA sequence Database of genes in human DNA 6

7 The question How closely does the new object resemble any of the collection of objects we have seen before? 7

8 Generative models Generative model Specifies a probability distribution over the collection. In principle, it can generate the collection and its variants or riffs collection 8

9 Generative models New object Generative model Probability that the new object could have been generated by the model Resemble closely is translated as how probable/likely is it that the new object could have been generated by the model. 9

10 Probability theory Probability theory assigns a numerical degree of belief (between 0 and 1) to states of affairs. Example: P(I have a headache) = 0.7 I am stating that 70% of the time I have headaches. This probability can be estimated as the fraction of days in a year that I have a headache. 10

11 Conditional probability in pictures Picture not to scale H H F H = I have a headache F = I m coming down with the flu P(H)=0.1, P(F)=0.025, P(H F)=0.5 11

12 Conditional probability P(A B) = probability of proposition A given that all we know is B. 12

13 Making inferences P(H)=0.7 P(F)=0.025 P(H F)=0.5 I have a headache (H). What is the probability that I am coming down with the flu (F)? P(F H) =? 13

14 Bayes Rule likelihood prior Pr(F H) = Reverend Thomas Bayes

15 Generative models Each element in the set is a book or a piece of text Set of all written works Given a piece of text d, a generative model m can compute P(d m), for every d in this space Large, finite, set 15

16 The big idea Each model produces a probability of generating the new document (this is called the likelihood of the document with respect to the model) P(d m 1 ),, P(d m n ) Collection 1: Shakespeare Collection 2: Ibsen Collection n: Thoreau Generative model 1 Generative model 2 Generative model n 16

17 The big idea (contd.) We can calculate the probability of the document d given a model m: Pr(d m) What we want is the probability of the model being the generator of the document: Pr(m d) m = authorship attribution d = evidence from the new document 17

18 Calculating Pr(m d) likelihood prior Recipe: Given choices of m, we compute Pr(m d),and pick the model with the highest value of Pr(m d) 18

19 Ham/spam detection. We have generative models of ham and spam that are trained over your s, i.e. they calculate P( ham) and P( spam) The spam filter calculates P(ham ) and P(spam ) using Bayes rule and priors on ham and spam (P(ham) and P(spam)) 19

20 Ham/Spam detection Decision rule: If Pr(ham d) > Pr(spam d) accept mail, else mark as SPAM 20

21 Feature representations We will represent a collection of objects by a probability distribution over a set of features descriptive of the objects. Examples of features In text, individual words are features. In images, image patches, edges or pixel intensities are features. In DNA sequences, the individual nucleotides are features. 21

22 Text features She loves you yeh yeh yeh Features are the words She text collection loves you yeh 22

23 A simple generative model Zeroth order Markov chain Associates with each word feature, the probability of its occurrence in the text She: 1/6 loves: 1/6 histogram you: 1/6 3/6 yeh: 1/2 1/6 1/6 1/6 She loves you yeh 23

24 Modeling collections as probability distributions over features Given a large collection of texts, we can represent the collection as a discrete probability distribution over the words in it. histogram 3/6 1/6 1/6 1/6 She loves you yeh 24

25 Likelihood of generation Given a piece of text, say She you loves, we can calculate the probability of its generation by this simple model. Pr( She you loves ) = Pr( She )*Pr( you )*Pr( loves ) = 1/6*1/6*1/6 25

26 Likelihood of generation Given a piece of text, say yeh yeh yeh, we can calculate the probability of its generation by this simple model. Pr( yeh yeh yeh ) = Pr( yeh )*Pr( yeh )*Pr( yeh ) = 1/2*1/2*1/2 The text snippet yeh yeh yeh is 27 times more likely to have been generated by the model than the snippet She you loves 26

27 A more complex model First-order Markov chain We can do better than a simple histogram over words! We can build a model that takes word order into account. To do this, we need a conditional probability distribution Pr(word[i]=x word[i-1] = y) 27

28 Example She loves you yeh yeh yeh word i Transition probabilities she loves you yeh she loves word i-1 you yeh Probability of start word: She: 1 loves, you, yeh: 0 Start probabilities 28

29 First-order Markov chains Specified by Start probabilities: π(w), over all word features w in collection Transition probabilities a(w i w j ), for word features w i, w j in collection 29

30 Using the model Given a word sequence w 0 w 1 w n We calculate the probability of generation as π(w 0 )*a(w 1 w 0 )* *a(w n w n-1 ) She loves you yeh π( She )*a(loves She)*a(you loves)*a(yeh you) =? 30

31 Markov models on DNA a AC A C a AT a GC T G a GT A state for each of the four letters A,C, G, and T in the DNA alphabet From a set of known CpG islands, and non CpG regions, estimate the transition probabilities + A C G T A C G T A C G T A C G T

32 Computing likelihoods 32

33 Recognition using Bayes rule P(CpG X=ATGA) = P(X=ATGA CpG)P(CpG)/P(X=ATGA) P(nonCpG X=ATGA) =P(X=ATGA noncpg)p(noncpg)/p(x=atga) 33

34 Making inferences Is the sequence more likely to come from a CpG island or a non-cpg region? Log-odds ratio 34

Lecture 9: Naive Bayes, SVM, Kernels. Saravanan Thirumuruganathan

Lecture 9: Naive Bayes, SVM, Kernels Instructor: Outline 1 Probability basics 2 Probabilistic Interpretation of Classification 3 Bayesian Classifiers, Naive Bayes 4 Support Vector Machines Probability