EECS E6870: Lecture 6: Pronunciation Modeling and Decision Trees

Size: px
Start display at page:

Download "EECS E6870: Lecture 6: Pronunciation Modeling and Decision Trees"

Transcription

1 EECS E6870: Lecture 6: Pronunciation Modeling and Decision Trees Stanley F. Chen, Michael A. Picheny and Bhuvana Ramabhadran IBM T. J. Watson Research Center Yorktown Heights, NY October 3, 009 EECS E6870: Seech Recognition

2 In the beginning. was the whole word model For each word in the vocabulary, decide on a toology Often the number of states in the model is chosen to be roortional to the number of honemes in the word Train the observation and transition arameters for a given word using examles of that word in the training data (Recall roblem 3 associated with Hidden Markov Models) Good domain for this aroach: digits Seech Recognition Lecture 6: Pronunciation Modeling and Decision Trees

3 Examle toologies: Digits Vocabulary consists of { zero, oh, one, two, three, four, five, six, seven, eight, nine } Assume we assign two states er honeme Must allow for different durations use self loos and ski arcs Models look like: zero oh 9 0 Seech Recognition Lecture 6: Pronunciation Modeling and Decision Trees 3

4 one two three four five Seech Recognition Lecture 6: Pronunciation Modeling and Decision Trees 4

5 six seven eight nine Seech Recognition Lecture 6: Pronunciation Modeling and Decision Trees 5

6 How to reresent EECS E6870 any sequence of digits? Seech Recognition Lecture 6: Pronunciation Modeling and Decision Trees 6

7 9 EECS E Seech Recognition Lecture 6: Pronunciation Modeling and Decision Trees 7

8 State: EECS E6870 Trellis Reresentation Time: Seech Recognition Lecture 6: Pronunciation Modeling and Decision Trees 8

9 Whole-word model limitations The whole-word model suffers from two main roblems. Cannot model unseen words. In fact, we need several samles of each word to train the models roerly. Cannot share data among models data sarseness roblem.. The number of arameters in the system is roortional to the vocabulary size. Thus, whole-word models are best on small vocabulary tasks. Seech Recognition Lecture 6: Pronunciation Modeling and Decision Trees 9

10 Subword Units To reduce the number of arameters, we can comose word models from sub-word units. These units can be shared among words. Examles include: Units Aroximate number Phones 50 Dihones 000 Trihones 0,000 Syllables 5,000 Each unit is small The number of arameters is roortional to the number of units (not the number of words in the vocabulary as in whole-word models.) Seech Recognition Lecture 6: Pronunciation Modeling and Decision Trees 0

11 Phonetic Models We reresent each word as a sequence of honemes. This reresentation is the baseform for the word. BANDS B AE N D Z Some words need more than one baseform THE DH UH DH IY Seech Recognition Lecture 6: Pronunciation Modeling and Decision Trees

12 Baseform Dictionary To determine the ronunciation of each word, we look it u in a dictionary Each word may have several ossible ronunciations Every word in our training scrit and test vocabulary must be in the dictionary The dictionary is generally written by hand Prone to errors and inconsistencies nd looks wrong AA K.. is missing Shouldn t the choices For the st honeme be the Same for these words? Don t these words All start the same? acaulco acaulco accelerator accelerator acceleration acceleration accent accet accetable access accessory accessory AE K AX P AH L K OW AE K AX P UH K OW AX K S EH L AX R EY DX ER IX K S EH L AX R EY DX ER AX K S EH L AX R EY SH IX N AE K S EH L AX R EY SH IX N AE K S EH N T AX K S EH P T AX K S EH P T AX B AX L AE K S EH S AX K S EH S AX R IY EH K S EH S AX R IY Seech Recognition Lecture 6: Pronunciation Modeling and Decision Trees

13 Phonetic Models, cont d We can allow for honological variation by reresenting baseforms as grahs acaulco acaulco AE K AX P AH L K OW AA K AX P UH K OW AE K AX P AH L K OW AA UH Seech Recognition Lecture 6: Pronunciation Modeling and Decision Trees 3

14 Phonetic Models, cont d Now, construct a Markov model for each hone. Examles: (length) length Seech Recognition Lecture 6: Pronunciation Modeling and Decision Trees 4

15 Embedding Relace each hone by its Markov model to get a word model n.b. The model for each hone will have different arameter values AE K AX P AH L K OW AA UH Seech Recognition Lecture 6: Pronunciation Modeling and Decision Trees 5

16 Reducing Parameters by Tying Consider the three-state model t t t 3 t 4 t 5 t 6 Note that t and t corresond to the beginning of the hone t 3 and t 4 corresond to the middle of the hone t 5 and t 6 corresond to the end of the hone If we force the outut distributions for each member of those airs to be the same, then the training data requirements are reduced. Seech Recognition Lecture 6: Pronunciation Modeling and Decision Trees 6

17 Tying A set of arcs in a Markov model are tied to one another if they are constrained to have identical outut distributions. Similarly, states are tied if they have identical transition robabilities. Tying can be exlicit or imlicit. Seech Recognition Lecture 6: Pronunciation Modeling and Decision Trees 7

18 Imlicit Tying Occurs when we build u models for larger units from models of smaller units Examle: when word models are made from hone models First, consider an examle without any tying. Let the vocabulary consist of digits 0,,, 9 We can make a searate model for each word. To estimate arameters for each word model, we need several samles for each word. Samles of 0 affect only arameters for the 0 model. Seech Recognition Lecture 6: Pronunciation Modeling and Decision Trees 8

19 Imlicit Tying, cont d Now consider hone-based models for this vocabulary 0 Z IY R OW W AA N T UW 3 TH R IY 4 F AO R 5 F AY V 6 S IH K S 7 S EH V AX N 8 EY T 9 N AY N Training samles of 0 will also affect models for 3 and 4 Useful in large vocabulary systems where the number of words is much greater than the number of hones. Seech Recognition Lecture 6: Pronunciation Modeling and Decision Trees 9

20 Exlicit Tying Examle: b b m m e e 6 non-null arcs, but only 3 different outut distributions because of tying Number of model arameters is reduced Tying saves storage because only one coy of each distribution is saved Fewer arameters mean less training data needed Seech Recognition Lecture 6: Pronunciation Modeling and Decision Trees 0

21 Variations in realizations of honemes The broad units, honemes, have variants known as allohones Examle: In English and h (un-asirated and asirated ) Exercise: Put your hand in front of your mouth and ronounce sin and then in. Note that the in in has a uff of air, while the in sin does not. Articulators have inertia, thus the ronunciation of a honeme is influenced by surrounding honemes. This is known as co-articulation Examle: Consider k and g in different contexts In key and geese the whole body of the tongue has to be ulled u to make the vowel Closure of the k moves forward comared to caw and gauze Phonemes have canonical articulator target ositions that may or may not be reached in a articular utterance. Seech Recognition Lecture 6: Pronunciation Modeling and Decision Trees

22 kee Seech Recognition Lecture 6: Pronunciation Modeling and Decision Trees

23 coo Seech Recognition Lecture 6: Pronunciation Modeling and Decision Trees 3

24 Context-deendent models We can model hones in context Two aroaches: trihones and leaves Both methods use clustering. Trihones use bottom-u clustering, leaves use to-down clustering Tyical imrovements of seech recognizers when introducing context deendence: 30% - 50% fewer errors. Seech Recognition Lecture 6: Pronunciation Modeling and Decision Trees 4

25 Tri-hone models Model each honeme in the context of its left and right neighbor e.g. K-IY+P is a model for IY when K is its left context honeme and P is its right context honeme If we have 50 honemes in a language, we could have as many as 50 3 trihones to model Not all of these occur Still have data sarsity issues Try to solve these issues by agglomerative clustering Seech Recognition Lecture 6: Pronunciation Modeling and Decision Trees 5

26 Agglomerative / Bottom-u Clustering Start with each item in a cluster by itself Find closest air of items Merge them into a single cluster Iterate Different results based on distance measure used Single-link: dist(a,b) = min dist(a,b) for a A, b B Comlete-link: dist(a,b) = max dist(a,b) for a A, b B Seech Recognition Lecture 6: Pronunciation Modeling and Decision Trees 6

27 Bottom-u clustering / Single Link Assume our data oints look like: Single-link: clusters are close if any of their oints are: dist(a,b) = min dist(a,b) for a A, b B Single-link clustering into grous roceeds as: Single-link clustering tends to yield long, stringy, meandering clusters Seech Recognition Lecture 6: Pronunciation Modeling and Decision Trees 7

28 Bottom-u clustering / Comlete Link Again, assume our data oints look like: Comlete-link: clusters are close only if ALL of their oints are: dist(a,b) = max dist(a,b) for a A, b B Single-link clustering into grous roceeds as: Comlete-link clustering tends to yield roundish clusters Seech Recognition Lecture 6: Pronunciation Modeling and Decision Trees 8

29 Dendogram A natural way to dislay clusters is through a dendogram Shows the clusters on the x-axis, distance between clusters on the y-axis Provides some guidance as to a good choice for the number of clusters x x x 3 x 4 x 5 x 6 x 7 x 8 distance Seech Recognition Lecture 6: Pronunciation Modeling and Decision Trees 9

30 Trihone Clustering We can use e.g. comlete-link clustering to cluster trihones Hels with data sarsity issue Still have an issue with unseen data To model unseen events, we need to back-off to lower order models such as bi-hones and uni-hones Seech Recognition Lecture 6: Pronunciation Modeling and Decision Trees 30

31 Allohones Pronunciation variations of honemes Less rigidly defined than trihones In our seech recognition system, we tyically have about 50 allohones er honeme. Older techniques ( tri-hones ) tried to enumerate by hand the set of allohones for each honeme. We currently use automatic methods to identify the allohones of a honeme. Boils down to conditional modeling. Seech Recognition Lecture 6: Pronunciation Modeling and Decision Trees 3

32 Conditional Modeling In seech recognition, we use conditional models and conditional statistics Acoustic model Conditioned on the hone context Selling-to-sound rules Describe the ronunciation of a letter in terms of a robability distribution that deends on neighboring letters and their ronunciations. The distribution is conditioned on the letter context and ronunciation context. Language model Probability distribution for the next word is conditioned on the receding words One way to define the set of conditions is a class of statistical models known as decision trees Classic text: L. Breiman et al. Classification and Regression Trees. Wadsworth & Brooks. Monterey, California Seech Recognition Lecture 6: Pronunciation Modeling and Decision Trees 3

33 Decision Trees We would like to find equivalence classes among our training samles DTs categorize data into multile disjoint categories The urose of a decision tree is to ma conditions (such as honetic contexts) into equivalence classes The goal in constructing a decision tree is to create good equivalence classes DTs are examles of divisive / to-down clustering Each internal node secifies an attribute that an object must be tested on Each leaf node reresents a classification Seech Recognition Lecture 6: Pronunciation Modeling and Decision Trees 33

34 What does a decision tree look like? Seech Recognition Lecture 6: Pronunciation Modeling and Decision Trees 34

35 Tyes of Attributes/Features Numerical: Domain is ordered and can be reresented on the real line (e.g., age, income) Nominal or categorical: Domain is a finite set without any natural ordering (e.g., occuation, marital status, race) Ordinal: Domain is ordered, but absolute differences between values is unknown (e.g., reference scale, severity of an injury) Seech Recognition Lecture 6: Pronunciation Modeling and Decision Trees 35

36 The Classification Problem If the deendent variable is categorical, the roblem is a classification roblem Let C be the class label of a given data oint X=X,, X k Let d( ) be the redicted class label Define the misclassification rate of d: Prob (d(x,, X k )!= C Problem definition: Given a dataset, find the classifier d such that the misclassification rate is minimized. Seech Recognition Lecture 6: Pronunciation Modeling and Decision Trees 36

37 The Regression Problem If the deendent variable is numerical, the roblem is a regression roblem. The tree d mas observation X to rediction Y of Y and is called a regression function. Define mean squared error rate of d as: E[ (Y - d(x,, X k )) ] Problem definition: Given dataset, find regression function d such that mean squared error is minimized. Seech Recognition Lecture 6: Pronunciation Modeling and Decision Trees 37

38 Goals & Requirements Goals: To roduce an accurate classifier/regression function To understand the structure of the roblem Requirements on the model: High accuracy Understandable by humans, interretable Fast construction for very large training databases Seech Recognition Lecture 6: Pronunciation Modeling and Decision Trees 38

39 Decision Trees: Letter-to-Sound Examle Let s say we want to build a tree to decide how the letter will sound in various words Training examles: loohole eanuts ay ale f hysics telehone grah hoto φ ale sycho terodactyl neumonia The ronunciation of deends on its context. Task: Using the above training data, artition the contexts into equivalence classes so as to minimize the uncertainty of the ronunciation. Seech Recognition Lecture 6: Pronunciation Modeling and Decision Trees 39

40 Decision Trees: Letter-to-Sound Examle, cont d Denote the context as L L R R R = h? Y N loohole f hysics telehone grah hoto eanut ay ale φ ale sycho terodactyl neumonia At this oint we have two equivalence classes:. R = h and. R g h The ronunciation of class is either or f, with f much more likely than. The ronunciation of class is either or φ. Seech Recognition Lecture 6: Pronunciation Modeling and Decision Trees 40

41 R = h? Y N loohole f hysics telehone grah hoto eanut ay ale φ ale sycho terodactyl neumonia L = o? R = consonant? Y loohole N f hysics telehone grah hoto 3 Y φ ale ale sycho terodactyl neumonia N 4 eanut ay Four equivalence classes. Uncertainty remains only in class 3. Seech Recognition Lecture 6: Pronunciation Modeling and Decision Trees 4

42 R = h? Y N loohole f hysics telehone grah hoto eanut ay ale φ ale sycho terodactyl neumonia L = o? loohole Five equivalence classes, which is much less than the number of letter contexts. No uncertainty left in the classes. f hysics telehone grah hoto A node without children (an equivalence class) is called a leaf node. Otherwise it is called an internal node. Y N Y N ale L = a? φ ale sycho terodactyl neumonia Y N 3 φ ale 4 ale sycho terodactyl neumonia R = consonant? eanut ay Seech Recognition Lecture 6: Pronunciation Modeling and Decision Trees 4 5

43 Consider test case: Paris R = h? Y N loohole f hysics telehone grah hoto eanut ay ale φ ale sycho terodactyl neumonia L = o? loohole Y N Y N f hysics telehone grah hoto ale φ ale sycho terodactyl neumonia R = consonant? 5 eanut ay Correct L = a? Y N 3 φ ale 4 ale sycho terodactyl neumonia Seech Recognition Lecture 6: Pronunciation Modeling and Decision Trees 43

44 Consider test case : goher R = h? Y N loohole f hysics telehone grah hoto eanut ay ale φ ale sycho terodactyl neumonia L = o? loohole Wrong Y N Y N f hysics telehone grah hoto ale φ ale sycho terodactyl neumonia R = consonant? 5 eanut ay Although effective on the training data, this tree does not generalize well. It was constructed from too little data. L = a? Y N 3 φ ale 4 ale sycho terodactyl neumonia Seech Recognition Lecture 6: Pronunciation Modeling and Decision Trees 44

45 Decision Tree Construction. Find the best question for artitioning the data at a given node into equivalence classes.. Reeat ste recursively on each child node. 3. Sto when there is insufficient data to continue or when the best question is not sufficiently helful. Seech Recognition Lecture 6: Pronunciation Modeling and Decision Trees 45

46 Basic Issues to Solve The selection of the slits The decisions when to declare a node terminal or to continue slitting The assignment of each node to a class Seech Recognition Lecture 6: Pronunciation Modeling and Decision Trees 46

47 Decision Tree Construction Fundamental Oeration There is only fundamental oeration in tree construction: Find the best question for artitioning a subset of the data into two smaller subsets. i.e. Take an equivalence class and slit it into more-secific classes. Seech Recognition Lecture 6: Pronunciation Modeling and Decision Trees 47

48 Decision Tree Greediness Tree construction roceeds from the to down from root to leaf. Each slit is intended to be locally otimal. Constructing a tree in this greedy fashion usually leads to a good tree, but robably not globally otimal. Finding the globally otimal tree is an NP-comlete roblem: it is not ractical. Seech Recognition Lecture 6: Pronunciation Modeling and Decision Trees 48

49 Slitting Each internal node has an associated slitting question. Examle questions: Age <= 0 (numeric) Profession in {student, teacher} (categorical) 5000*Age + 3*Salary 0000 > 0 (function of raw features) Seech Recognition Lecture 6: Pronunciation Modeling and Decision Trees 49

50 Dynamic Questions The best question to ask about some discrete variable x consists of the best subset of the values taken by x. Search over all subsets of values taken by x at a given node. (This is generating questions on the fly during tree construction.) xc{a,b,c} Q: xc{a}? Q: xc{b}? Q3: xc{c}? Q4: xc{a,b}? Q5: xc{a,c}? Q6: xc{b,c}? Q7: xc{a,b,c}? Use the best question found. Potential roblems:. Requires a lot of CPU. For alhabet size A there are questions. j. Allows a lot of freedom, making it easy to overtrain. j A Seech Recognition Lecture 6: Pronunciation Modeling and Decision Trees 50

51 Pre-determined Questions The easiest way to construct a decision tree is to create in advance a list of ossible questions for each variable. Finding the best question at any given node consists of subjecting all relevant variables to each of the questions, and icking the best combination of variable and question. In acoustic modeling, we tyically ask about 0 variables: the 5 hones to the left of the current hone and the 5 hones to the right of the current hone. Since these variables all san the same alhabet (hone alhabet) only one list of questions. Each question on this list consists of a subset of the honetic hone alhabet. Seech Recognition Lecture 6: Pronunciation Modeling and Decision Trees 5

52 Samle Questions Phones Letters {P} {A} {T} {E} {K} {I} {B} {O} {D} {U} {G} {Y} {P,T,K} {A,E,I,O,U} {B,D,G} {A,E,I,O,U,Y} {P,T,K,B,D,G} Seech Recognition Lecture 6: Pronunciation Modeling and Decision Trees 5

53 Discrete Questions A decision tree has a question associated with every non-terminal node. If x is a discrete variable which takes on values in some finite alhabet Α, then a question about x has the form: xcs? where S is a subset of Α. Let L denote the receding letter in building a selling-to-sound tree. Let S={A,E,I,O,U}. Then LcS? denotes the question: Is the receding letter a vowel? Let R denote the following hone in building an acoustic context tree. Let S={P,T,K}. Then RcS? denotes the question: Is the following hone an unvoiced sto? Seech Recognition Lecture 6: Pronunciation Modeling and Decision Trees 53

54 Continuous Questions If x is a continuous variable which takes on real values, a question about x has the form x<θ? where θ is some real value. In order to find the threshold θ, we must try values which searate all training samles. θ θ θ 3 x We do not currently use continuous questions for seech recognition. Seech Recognition Lecture 6: Pronunciation Modeling and Decision Trees 54

55 Tyes of Questions In rincile, a question asked in a decision tree can have any number (greater than ) of ossible outcomes. Examles: Binary: Yes No 3 Outcomes: Yes No Don t_know 6 Outcomes: A B C Z In ractice, only binary questions are used to build decision trees. Seech Recognition Lecture 6: Pronunciation Modeling and Decision Trees 55

56 Simle Binary Question A simle binary question consists of a single Boolean condition, and no Boolean oerators. X c S? Is a simle question. (( X c S ) && ( X c S ))? Is not a simle question. Toologically, a simle question looks like: xcs? Y N Seech Recognition Lecture 6: Pronunciation Modeling and Decision Trees 56

57 Comlex Binary Question A comlex binary question has recisely outcomes (yes, no) but has more than Boolean condition and at least Boolean oerator. (( X c S ) && ( X c S ))? Is a comlex question. Toologically this question can be shown as: x cs? x cs? Y N Outcomes: ( X c S ) 3 ( X c S ) Y N ( X c S ) 3 ( X c S ) =( X c S ) 4 ( X c S ) All comlex questions can be reresented as binary trees with terminal nodes tied to roduce outcomes. Seech Recognition Lecture 6: Pronunciation Modeling and Decision Trees 57

58 Configurations Currently Used All decision trees currently used in seech recognition use: a re-determined set of simle, binary questions on discrete variables. Seech Recognition Lecture 6: Pronunciation Modeling and Decision Trees 58

59 Tree Construction Overview Let x x n denote n discrete variables whose values may be asked about. Let Q ij denote the j th re-determined question for x i. Starting at the root, try slitting each node into sub-nodes:. For each variable x i evaluate questions Q i, Q i, and let Q i denote the best.. Find the best air x i,q i and denote it x,q. 3. If Q is not sufficiently helful, make the current node a leaf. 4. Otherwise, slit the current node into new sub-nodes according to the answer of question Q on variable x. Sto when all nodes are either too small to slit further or have been marked as leaves. Seech Recognition Lecture 6: Pronunciation Modeling and Decision Trees 59

60 Question Evaluation The best question at a node is the question which maximizes the likelihood of the training data at that node after alying the question. Parent node: Model as a single Gaussian N(µ,Σ ) Comute likelihood L(data µ,σ ) Q? Y N Left child node: Model as a single Gaussian N(µ l,σ l ) Comute likelihood L(data l µ l,σ l ) Right child node: Model as a single Gaussian N(µ r,σ r ) Comute likelihood L(data r µ r,σ r ) Goal: Find Q such that L(data l µ l,σ l ) x L(data r µ r,σ r ) is maximized. Seech Recognition Lecture 6: Pronunciation Modeling and Decision Trees 60

61 Question Evaluation, cont d Let x,x, x N be a samle of feature x, in which outcome a i occurs c i times. Let Q be a question which artitions this samle into left and right subsamles of size n l and n r, resectively. Let c il, c i r denote the frequency of a i in the left and right sub-samles. The best question Q for feature x is the one which maximizes the conditional likelihood of the samle given Q, or, equivalently, maximizes the conditional log likelihood of the samle. Seech Recognition Lecture 6: Pronunciation Modeling and Decision Trees 6

62 Seech Recognition Lecture 6: Pronunciation Modeling and Decision Trees 6 log likelihood comutation The log likelihood of the data, given that we ask question Q, is: Using the maximum likelihood estimates of il, ir gives: = = + = N i N i r i r i l i l i n c c Q x x L log log ).. ( log r r l l r i r i N i l i l i N i r i r N i N i r i r i N i l i l l i l i r N i N i r i r i l l i l i n n n n n c c c c c n c c c n c c n c c n c c Q x x L log log } log log { log log log log ) / log( ) / log( ).. ( log + = + = + = = = = = = = = The best question is the one which maximizes this simle exression. c il,c ir,n l,and n r are all non-negative integers. The above exression can be comuted very efficiently using a re-comuted table of n log n for non-negative integers n.

63 Entroy EECS E6870 Entroy is a measure of uncertainty, measured in bits. Let x be a discrete random variable taking values a..a N in an alhabet A of size N with robabilities N resectively. The uncertainty about what value x will take can be measured by the entroy of the robability distribution =( N ). H H N = = 0 i= i log i = for some j i H>=0. Entroy is maximized when i =/N for all i. Then H=log N. Thus H tells us something about the sharness of the distribution. j and = 0 for i j H can be interreted as the theoretical minimum average number of bits that are required to encode/transmit the distribution. Seech Recognition Lecture 6: Pronunciation Modeling and Decision Trees 63

64 What does entroy look like for a binary variable? Seech Recognition Lecture 6: Pronunciation Modeling and Decision Trees 64

65 Entroy and Likelihood Let x be a discrete random variable taking values a..a N in an alhabet A of size N with robabilities N resectively. Let x.. x n be a samle of x in which a i occurs c i times, i=..n. The samle likelihood is: The maximum likelihood estimate of i is N L = i i= c i ˆ i = c i n Thus, an estimate of the samle log likelihood is: and log Lˆ = log n N i= c log ˆ i Lˆ = N i= i ˆ log i ˆ i = Hˆ Since n is a constant, and log is a monotonic function, H is monotonically related to L. Therefore, maximizing likelihood minimizing entroy. Seech Recognition Lecture 6: Pronunciation Modeling and Decision Trees 65

66 tree, revisited : eanut, ay, loohole, ale c = 4 f : hysics, hoto, grah, telehone c f = 4 φ : ale, sycho, terodactyl, neumonia c φ = 4 n= Log likelihood of data at the root node is: log = L( x 4log.. x ) 4 + 4log 3 = c i= i log 4 + 4log c i nlog 4 log n = 9.0 Average entroy at the root node is: H ( x.. x ) = log L( x.. x n = 9.0 / =.58 bits ) Seech Recognition Lecture 6: Pronunciation Modeling and Decision Trees 66

67 tree revisited: Question A R = h? Y N loohole f hysics telehone grah hoto eanut ay ale φ ale sycho terodactyl neumonia n l =5 c l = c fl =4 c φl =0 n r =7 c r =3 c fr =0 c φr =4 Seech Recognition Lecture 6: Pronunciation Modeling and Decision Trees 67

68 tree revisited: Question A Log likelihood of data after alying question A is: n l =5 c l = c fl =4 c φl =0 n r =7 c r =3 c fr =0 c φr =4 log L( x.. x Q A ) = 4log 4 5log 5 + 3log 3+ 4log 4 7log 7 = 0.5 Average entroy of data after alying question A is: H ( x.. x Q A ) = log L( x.. x QA) = 0.5/ = 0.87 bits Increase in log likelihood due to question A is =8.5 Decrease in entroy due to question A is =0.7 bits Knowing the answer to question A rovides 0.7 bits of information about the ronunciation of. A further 0.87 bits of information is still required to remove all the uncertainty about the ronunciation of. Seech Recognition Lecture 6: Pronunciation Modeling and Decision Trees 68

69 tree revisited: Question B L = φ? Y N eanut ay f hysics hoto φ sycho terodactyl neumonia loohole ale f telehone grah φ ale n l =7 c l = c fl = c φl =3 n r =5 c r = c fr = c φr = Seech Recognition Lecture 6: Pronunciation Modeling and Decision Trees 69

70 tree revisited: Question B n l =7 c l = c fl = c φl =3 n r =5 c r = c fr = c φr = Log likelihood of data after alying question B is: log log L( x.. x + log Q B log 5log + log 5 = 8.5 Average entroy of data after alying question B is: H ( x.. x Q B ) = + 3log 3 7log Increase in log likelihood due to question B is =0.5 ) = log L( x.. x QB ) = 8.5/ =.54 bits 7 + Decrease in entroy due to question B is =0.04 bits Knowing the answer to question A rovides 0.04 bits of information (very little) about the ronunciation of. Seech Recognition Lecture 6: Pronunciation Modeling and Decision Trees 70

71 tree revisited: Question C L = vowel? Y N loohole ale f telehone grah eanut ay f hysics hoto φ ale sycho terodactyl neumonia n l =4 c l = c fl = c φl =0 n r =8 c r = c fr = c φr =4 Seech Recognition Lecture 6: Pronunciation Modeling and Decision Trees 7

72 tree revisited: Question C n l =4 c l = c fl = c φl =0 n r =8 c r = c fr = c φr =4 Log likelihood of data after alying question C is: log L( x.. x + log Q C + log log + 4log + log 4 8log 4log Average entroy of data after alying question C is: H ( x.. x Q C ) = 8 = 6.00 Increase in log likelihood due to question C is =3.0 Decrease in entroy due to question C is =0.5 bits Knowing the answer to question C rovides 0.5 bits of information about the ronunciation of. ) = log L( x.. x QC ) = 6 / =.33 bits 4 Seech Recognition Lecture 6: Pronunciation Modeling and Decision Trees 7

73 Comarison of Questions A, B, C Log likelihood of data given question: A -0.5 B -8.5 C Average entroy (bits) of data given question: A 0.87 B.54 C.33 Gain in information (in bits) due to question: A 0.7 B 0.04 C 0.5 These measures all say the same thing: Question A is best. Question C is nd best. Question B is worst. Seech Recognition Lecture 6: Pronunciation Modeling and Decision Trees 73

74 Best Question In general, we seek questions which maximize the likelihood of the training data given some model. The exression to be maximized deends on the model. In the revious examle the model is a multinomial distribution. Another examle: context-deendent rototyes use a continuous distribution. Seech Recognition Lecture 6: Pronunciation Modeling and Decision Trees 74

75 Best Question: Context-Deendent Prototyes For context-deendent rototyes, we use a decision tree for each arc in the Markov model. At each leaf of the tree is a continuous distribution which serves as a context-deendent model of the acoustic vectors associated with the arc. The distributions and the tree are created from acoustic feature vectors aligned with the arc. We grow the tree so as to maximize the likelihood of the training data (as always), but now the training data are real-valued vectors. We estimate the likelihood of the acoustic vectors during tree construction using a diagonal Gaussian model. When tree construction is comlete, we relace the diagonal Gaussian models at the leaves by more accurate models to serve as the final rototyes. Tyically we use mixtures of diagonal Gaussians. Seech Recognition Lecture 6: Pronunciation Modeling and Decision Trees 75

76 Diagonal Gaussian Likelihood Let Y=y y y n be a samle of indeendent -dimensional acoustic vectors arising from a diagonal Gaussian distribution with mean µ and variances σ. Then log L( Y DG( µ, σ )) = The maximum likelihood estimates of µ and σ are: uˆ ˆ σ j j n n Hence, an estimate of log L(Y) is log L( Y = = n i= n i= y ij y ij DG( ˆ, µ ˆ σ )) = ˆ µ n { log(π ) + i= j= j j =,,... n j =,,... { log(π ) + i= j= logσ + log ˆ σ + j j j= j= ( y ij ( y µ ) σ ij j ˆ σ j ˆ µ ) j j } } Seech Recognition Lecture 6: Pronunciation Modeling and Decision Trees 76

77 Seech Recognition Lecture 6: Pronunciation Modeling and Decision Trees 77 Diagonal Gaussian Likelihood Now, Hence j n n y j n y y y j j n i ij j n i ij j n i ij j n i ij,,... ˆ ˆ,,.. ˆ ˆ ) ˆ ( = = = = + = = = = = σ µ µ µ µ } ˆ log ) log( { } ˆ log ) log( { )) ˆ ˆ, ( ( log n n n n DG Y L j j j n i j j n i + + = + + = = = = = = σ π σ π σ µ

78 Diagonal Gaussian Slits Let Q be a question which artitions Y into left and right sub-samles Y l and Y r, of size n l and n r. The best question is the one which maximizes logl(y l )+logl(y r ) Using a diagonal Gaussian model Common to all slits + n log L( Y = = { nl r log(π ) + n l DG( ˆ µ, ˆ σ )) + log L( Y log(π ) + n j= j= { n log(π ) + n} l r l l log ˆ σ + n log ˆ σ + n rj { nl lj r r j= DG( ˆ µ, ˆ σ )) } l log ˆ σ + n lj r r r j= log ˆ σ } rj Seech Recognition Lecture 6: Pronunciation Modeling and Decision Trees 78

79 Diagonal Gaussian Slits, cont d Thus, the best question Q minimizes: D Q = n l j= log ˆ σ + lj nr log ˆ σ rj j= Where ˆ σ lj = y j ( y j ) j =,,.. n n l y Y l l y Y D Q involves little more than summing vector elements and their squares. l Seech Recognition Lecture 6: Pronunciation Modeling and Decision Trees 79

80 How Big a Tree? CART suggests cross-validation Measure erformance on a held-out data set Choose the tree size that maximizes the likelihood of the held-out data In ractice, simle heuristics seem to work well A decision tree is fully grown when no terminal node can be slit Reasons for not slitting a node include:. Insufficient data for accurate question evaluation. Best question was not very helful / did not imrove the likelihood significantly 3. Cannot coe with any more nodes due to CPU/memory limitations Seech Recognition Lecture 6: Pronunciation Modeling and Decision Trees 80

81 Instability Decision trees have the undesirable roerty that a small change in the data can result in a large difference in the final tree Consider a -class roblem, w and w Observations for each class are -dimensional w x w o x x x x * (.3^) Seech Recognition Lecture 6: Pronunciation Modeling and Decision Trees 8

82 x < x x x o x x <0.3 x <0.6 x <0.35 x o x < x x o x o* o o o o x o o x o x Seech Recognition Lecture 6: Pronunciation Modeling and Decision Trees 8

83 x < x x x o x x <0.09 x <0.6 x o x x < x x o x o o^ o o o x o o x Seech Recognition Lecture 6: Pronunciation Modeling and Decision Trees 83

84 Bagging EECS E6870 Create multile models by training the same learner on different samles of the training data Given a training set of size n, create m different training sets of size n by samling with relacement Combine the m models using a simle majority vote. Can be alied to any learning method, including decision trees. Decreases the generalization error by reducing variance in the results for unstable learners (i.e. when the hyothesis can change dramatically if training data is slightly altered.) Seech Recognition Lecture 6: Pronunciation Modeling and Decision Trees 84

85 Seech Recognition Lecture 6: Pronunciation Modeling and Decision Trees 85

86 Boosting EECS E6870 Another method for roducing multile models by reeatedly altering the data given to a learner. Examles are given weights, and at each iteration a new hyothesis is learned and the examles are reweighted to focus on those that the latest hyothesis got wrong During testing, each of the hyotheses gets a weighted voted roortional to its training set accuracy. Seech Recognition Lecture 6: Pronunciation Modeling and Decision Trees 86

87 Strengths & Weaknesses of Decision Trees Strengths Easy to generate; simle algorithm Relatively fast to construct Classification is very fast Can achieve good erformance on many tasks Weaknesses Not always sufficient to learn comlex concets Can be hard to interret. Real roblems can roduce large trees Some roblems with continuously valued attributes may not be easily discretized Data fragmentation Seech Recognition Lecture 6: Pronunciation Modeling and Decision Trees 87

88 Putting it all together Given a word sequence, we can construct the corresonding Markov model by:. Re-writing word string as a sequence of honemes. Concatenating honetic models 3. Using the aroriate tree for each arc to determine which alloarc (leaf) is to be used in that context Seech Recognition Lecture 6: Pronunciation Modeling and Decision Trees 88

89 Examle The rain in Sain falls. Look these words u in the dictionary to get: DH AX R EY N IX N S P EY N F AA L Z Rewrite hones as states according to honetic model DH DH DH 3 AX AX AX 3 R R R 3 EY EY EY 3 Using honetic context, descend decision tree to find leaf sequences DH _5 DH _7 DH 3_4 AX _53 AX _37 AX 3_ R _4 R _46. Use the Gaussian mixture model for the aroriate leaf as the observation robabilities for each state in the Hidden Markov Model. Seech Recognition Lecture 6: Pronunciation Modeling and Decision Trees 89

90 Course Feedback Was this lecture mostly clear or unclear? What was the muddiest toic? Other feedback (ace, content, atmoshere, etc.) Seech Recognition Lecture 6: Pronunciation Modeling and Decision Trees 90

Lecture 6. Michael Picheny, Bhuvana Ramabhadran, Stanley F. Chen, Markus Nussbaum-Thom

Lecture 6.  Michael Picheny, Bhuvana Ramabhadran, Stanley F. Chen, Markus Nussbaum-Thom Lecture 6 pr@ n2nsi"eis@n "mad@lin Michael Picheny, Bhuvana Ramabhadran, Stanley F. Chen, Markus Nussbaum-Thom Watson Group IBM T.J. Watson Research Center Yorktown Heights, New York, USA {picheny,bhuvana,stanchen,nussbaum}@us.ibm.com

More information

Feedback-error control

Feedback-error control Chater 4 Feedback-error control 4.1 Introduction This chater exlains the feedback-error (FBE) control scheme originally described by Kawato [, 87, 8]. FBE is a widely used neural network based controller

More information

Pronunciation Modeling for Large Vocabulary Continuous Speech Recognition

Pronunciation Modeling for Large Vocabulary Continuous Speech Recognition From the IBM Seech Recogitio Course Prouciatio Modelig for Large Vocabulary Cotiuous Seech Recogitio Michael Pichey IBM TJ Watso Research Ceter Yorktow Heights, NY 0598 4 Aril 0 0 IBM Cororatio The statistical

More information

4. Score normalization technical details We now discuss the technical details of the score normalization method.

4. Score normalization technical details We now discuss the technical details of the score normalization method. SMT SCORING SYSTEM This document describes the scoring system for the Stanford Math Tournament We begin by giving an overview of the changes to scoring and a non-technical descrition of the scoring rules

More information

Solved Problems. (a) (b) (c) Figure P4.1 Simple Classification Problems First we draw a line between each set of dark and light data points.

Solved Problems. (a) (b) (c) Figure P4.1 Simple Classification Problems First we draw a line between each set of dark and light data points. Solved Problems Solved Problems P Solve the three simle classification roblems shown in Figure P by drawing a decision boundary Find weight and bias values that result in single-neuron ercetrons with the

More information

Topic: Lower Bounds on Randomized Algorithms Date: September 22, 2004 Scribe: Srinath Sridhar

Topic: Lower Bounds on Randomized Algorithms Date: September 22, 2004 Scribe: Srinath Sridhar 15-859(M): Randomized Algorithms Lecturer: Anuam Guta Toic: Lower Bounds on Randomized Algorithms Date: Setember 22, 2004 Scribe: Srinath Sridhar 4.1 Introduction In this lecture, we will first consider

More information

Approximating min-max k-clustering

Approximating min-max k-clustering Aroximating min-max k-clustering Asaf Levin July 24, 2007 Abstract We consider the roblems of set artitioning into k clusters with minimum total cost and minimum of the maximum cost of a cluster. The cost

More information

Statistics II Logistic Regression. So far... Two-way repeated measures ANOVA: an example. RM-ANOVA example: the data after log transform

Statistics II Logistic Regression. So far... Two-way repeated measures ANOVA: an example. RM-ANOVA example: the data after log transform Statistics II Logistic Regression Çağrı Çöltekin Exam date & time: June 21, 10:00 13:00 (The same day/time lanned at the beginning of the semester) University of Groningen, Det of Information Science May

More information

Statistical NLP Spring Digitizing Speech

Statistical NLP Spring Digitizing Speech Statistical NLP Spring 2008 Lecture 10: Acoustic Models Dan Klein UC Berkeley Digitizing Speech 1 Frame Extraction A frame (25 ms wide) extracted every 10 ms 25 ms 10ms... a 1 a 2 a 3 Figure from Simon

More information

Digitizing Speech. Statistical NLP Spring Frame Extraction. Gaussian Emissions. Vector Quantization. HMMs for Continuous Observations? ...

Digitizing Speech. Statistical NLP Spring Frame Extraction. Gaussian Emissions. Vector Quantization. HMMs for Continuous Observations? ... Statistical NLP Spring 2008 Digitizing Speech Lecture 10: Acoustic Models Dan Klein UC Berkeley Frame Extraction A frame (25 ms wide extracted every 10 ms 25 ms 10ms... a 1 a 2 a 3 Figure from Simon Arnfield

More information

Round-off Errors and Computer Arithmetic - (1.2)

Round-off Errors and Computer Arithmetic - (1.2) Round-off Errors and Comuter Arithmetic - (.). Round-off Errors: Round-off errors is roduced when a calculator or comuter is used to erform real number calculations. That is because the arithmetic erformed

More information

Radial Basis Function Networks: Algorithms

Radial Basis Function Networks: Algorithms Radial Basis Function Networks: Algorithms Introduction to Neural Networks : Lecture 13 John A. Bullinaria, 2004 1. The RBF Maing 2. The RBF Network Architecture 3. Comutational Power of RBF Networks 4.

More information

Named Entity Recognition using Maximum Entropy Model SEEM5680

Named Entity Recognition using Maximum Entropy Model SEEM5680 Named Entity Recognition using Maximum Entroy Model SEEM5680 Named Entity Recognition System Named Entity Recognition (NER): Identifying certain hrases/word sequences in a free text. Generally it involves

More information

AI*IA 2003 Fusion of Multiple Pattern Classifiers PART III

AI*IA 2003 Fusion of Multiple Pattern Classifiers PART III AI*IA 23 Fusion of Multile Pattern Classifiers PART III AI*IA 23 Tutorial on Fusion of Multile Pattern Classifiers by F. Roli 49 Methods for fusing multile classifiers Methods for fusing multile classifiers

More information

General Linear Model Introduction, Classes of Linear models and Estimation

General Linear Model Introduction, Classes of Linear models and Estimation Stat 740 General Linear Model Introduction, Classes of Linear models and Estimation An aim of scientific enquiry: To describe or to discover relationshis among events (variables) in the controlled (laboratory)

More information

where x i is the ith coordinate of x R N. 1. Show that the following upper bound holds for the growth function of H:

where x i is the ith coordinate of x R N. 1. Show that the following upper bound holds for the growth function of H: Mehryar Mohri Foundations of Machine Learning Courant Institute of Mathematical Sciences Homework assignment 2 October 25, 2017 Due: November 08, 2017 A. Growth function Growth function of stum functions.

More information

Combining Logistic Regression with Kriging for Mapping the Risk of Occurrence of Unexploded Ordnance (UXO)

Combining Logistic Regression with Kriging for Mapping the Risk of Occurrence of Unexploded Ordnance (UXO) Combining Logistic Regression with Kriging for Maing the Risk of Occurrence of Unexloded Ordnance (UXO) H. Saito (), P. Goovaerts (), S. A. McKenna (2) Environmental and Water Resources Engineering, Deartment

More information

The Noisy Channel Model. Statistical NLP Spring Mel Freq. Cepstral Coefficients. Frame Extraction ... Lecture 10: Acoustic Models

The Noisy Channel Model. Statistical NLP Spring Mel Freq. Cepstral Coefficients. Frame Extraction ... Lecture 10: Acoustic Models Statistical NLP Spring 2009 The Noisy Channel Model Lecture 10: Acoustic Models Dan Klein UC Berkeley Search through space of all possible sentences. Pick the one that is most probable given the waveform.

More information

Statistical NLP Spring The Noisy Channel Model

Statistical NLP Spring The Noisy Channel Model Statistical NLP Spring 2009 Lecture 10: Acoustic Models Dan Klein UC Berkeley The Noisy Channel Model Search through space of all possible sentences. Pick the one that is most probable given the waveform.

More information

Tests for Two Proportions in a Stratified Design (Cochran/Mantel-Haenszel Test)

Tests for Two Proportions in a Stratified Design (Cochran/Mantel-Haenszel Test) Chater 225 Tests for Two Proortions in a Stratified Design (Cochran/Mantel-Haenszel Test) Introduction In a stratified design, the subects are selected from two or more strata which are formed from imortant

More information

Finding Shortest Hamiltonian Path is in P. Abstract

Finding Shortest Hamiltonian Path is in P. Abstract Finding Shortest Hamiltonian Path is in P Dhananay P. Mehendale Sir Parashurambhau College, Tilak Road, Pune, India bstract The roblem of finding shortest Hamiltonian ath in a eighted comlete grah belongs

More information

EECS E6870: Lecture 4: Hidden Markov Models

EECS E6870: Lecture 4: Hidden Markov Models EECS E6870: Lecture 4: Hidden Markov Models Stanley F. Chen, Michael A. Picheny and Bhuvana Ramabhadran IBM T. J. Watson Research Center Yorktown Heights, NY 10549 stanchen@us.ibm.com, picheny@us.ibm.com,

More information

Hotelling s Two- Sample T 2

Hotelling s Two- Sample T 2 Chater 600 Hotelling s Two- Samle T Introduction This module calculates ower for the Hotelling s two-grou, T-squared (T) test statistic. Hotelling s T is an extension of the univariate two-samle t-test

More information

ECE 534 Information Theory - Midterm 2

ECE 534 Information Theory - Midterm 2 ECE 534 Information Theory - Midterm Nov.4, 009. 3:30-4:45 in LH03. You will be given the full class time: 75 minutes. Use it wisely! Many of the roblems have short answers; try to find shortcuts. You

More information

EE/Stats 376A: Information theory Winter Lecture 5 Jan 24. Lecturer: David Tse Scribe: Michael X, Nima H, Geng Z, Anton J, Vivek B.

EE/Stats 376A: Information theory Winter Lecture 5 Jan 24. Lecturer: David Tse Scribe: Michael X, Nima H, Geng Z, Anton J, Vivek B. EE/Stats 376A: Information theory Winter 207 Lecture 5 Jan 24 Lecturer: David Tse Scribe: Michael X, Nima H, Geng Z, Anton J, Vivek B. 5. Outline Markov chains and stationary distributions Prefix codes

More information

Statics and dynamics: some elementary concepts

Statics and dynamics: some elementary concepts 1 Statics and dynamics: some elementary concets Dynamics is the study of the movement through time of variables such as heartbeat, temerature, secies oulation, voltage, roduction, emloyment, rices and

More information

Outline. Markov Chains and Markov Models. Outline. Markov Chains. Markov Chains Definitions Huizhen Yu

Outline. Markov Chains and Markov Models. Outline. Markov Chains. Markov Chains Definitions Huizhen Yu and Markov Models Huizhen Yu janey.yu@cs.helsinki.fi Det. Comuter Science, Univ. of Helsinki Some Proerties of Probabilistic Models, Sring, 200 Huizhen Yu (U.H.) and Markov Models Jan. 2 / 32 Huizhen Yu

More information

Cryptanalysis of Pseudorandom Generators

Cryptanalysis of Pseudorandom Generators CSE 206A: Lattice Algorithms and Alications Fall 2017 Crytanalysis of Pseudorandom Generators Instructor: Daniele Micciancio UCSD CSE As a motivating alication for the study of lattice in crytograhy we

More information

Finite Mixture EFA in Mplus

Finite Mixture EFA in Mplus Finite Mixture EFA in Mlus November 16, 2007 In this document we describe the Mixture EFA model estimated in Mlus. Four tyes of deendent variables are ossible in this model: normally distributed, ordered

More information

The Noisy Channel Model. Statistical NLP Spring Mel Freq. Cepstral Coefficients. Frame Extraction ... Lecture 9: Acoustic Models

The Noisy Channel Model. Statistical NLP Spring Mel Freq. Cepstral Coefficients. Frame Extraction ... Lecture 9: Acoustic Models Statistical NLP Spring 2010 The Noisy Channel Model Lecture 9: Acoustic Models Dan Klein UC Berkeley Acoustic model: HMMs over word positions with mixtures of Gaussians as emissions Language model: Distributions

More information

MODELING THE RELIABILITY OF C4ISR SYSTEMS HARDWARE/SOFTWARE COMPONENTS USING AN IMPROVED MARKOV MODEL

MODELING THE RELIABILITY OF C4ISR SYSTEMS HARDWARE/SOFTWARE COMPONENTS USING AN IMPROVED MARKOV MODEL Technical Sciences and Alied Mathematics MODELING THE RELIABILITY OF CISR SYSTEMS HARDWARE/SOFTWARE COMPONENTS USING AN IMPROVED MARKOV MODEL Cezar VASILESCU Regional Deartment of Defense Resources Management

More information

MATH 2710: NOTES FOR ANALYSIS

MATH 2710: NOTES FOR ANALYSIS MATH 270: NOTES FOR ANALYSIS The main ideas we will learn from analysis center around the idea of a limit. Limits occurs in several settings. We will start with finite limits of sequences, then cover infinite

More information

Notes on Instrumental Variables Methods

Notes on Instrumental Variables Methods Notes on Instrumental Variables Methods Michele Pellizzari IGIER-Bocconi, IZA and frdb 1 The Instrumental Variable Estimator Instrumental variable estimation is the classical solution to the roblem of

More information

Machine Learning: Homework 4

Machine Learning: Homework 4 10-601 Machine Learning: Homework 4 Due 5.m. Monday, February 16, 2015 Instructions Late homework olicy: Homework is worth full credit if submitted before the due date, half credit during the next 48 hours,

More information

dn i where we have used the Gibbs equation for the Gibbs energy and the definition of chemical potential

dn i where we have used the Gibbs equation for the Gibbs energy and the definition of chemical potential Chem 467 Sulement to Lectures 33 Phase Equilibrium Chemical Potential Revisited We introduced the chemical otential as the conjugate variable to amount. Briefly reviewing, the total Gibbs energy of a system

More information

Universal Finite Memory Coding of Binary Sequences

Universal Finite Memory Coding of Binary Sequences Deartment of Electrical Engineering Systems Universal Finite Memory Coding of Binary Sequences Thesis submitted towards the degree of Master of Science in Electrical and Electronic Engineering in Tel-Aviv

More information

An Introduction To Range Searching

An Introduction To Range Searching An Introduction To Range Searching Jan Vahrenhold eartment of Comuter Science Westfälische Wilhelms-Universität Münster, Germany. Overview 1. Introduction: Problem Statement, Lower Bounds 2. Range Searching

More information

STK4900/ Lecture 7. Program

STK4900/ Lecture 7. Program STK4900/9900 - Lecture 7 Program 1. Logistic regression with one redictor 2. Maximum likelihood estimation 3. Logistic regression with several redictors 4. Deviance and likelihood ratio tests 5. A comment

More information

arxiv: v1 [physics.data-an] 26 Oct 2012

arxiv: v1 [physics.data-an] 26 Oct 2012 Constraints on Yield Parameters in Extended Maximum Likelihood Fits Till Moritz Karbach a, Maximilian Schlu b a TU Dortmund, Germany, moritz.karbach@cern.ch b TU Dortmund, Germany, maximilian.schlu@cern.ch

More information

DETC2003/DAC AN EFFICIENT ALGORITHM FOR CONSTRUCTING OPTIMAL DESIGN OF COMPUTER EXPERIMENTS

DETC2003/DAC AN EFFICIENT ALGORITHM FOR CONSTRUCTING OPTIMAL DESIGN OF COMPUTER EXPERIMENTS Proceedings of DETC 03 ASME 003 Design Engineering Technical Conferences and Comuters and Information in Engineering Conference Chicago, Illinois USA, Setember -6, 003 DETC003/DAC-48760 AN EFFICIENT ALGORITHM

More information

Use of Transformations and the Repeated Statement in PROC GLM in SAS Ed Stanek

Use of Transformations and the Repeated Statement in PROC GLM in SAS Ed Stanek Use of Transformations and the Reeated Statement in PROC GLM in SAS Ed Stanek Introduction We describe how the Reeated Statement in PROC GLM in SAS transforms the data to rovide tests of hyotheses of interest.

More information

Model checking, verification of CTL. One must verify or expel... doubts, and convert them into the certainty of YES [Thomas Carlyle]

Model checking, verification of CTL. One must verify or expel... doubts, and convert them into the certainty of YES [Thomas Carlyle] Chater 5 Model checking, verification of CTL One must verify or exel... doubts, and convert them into the certainty of YES or NO. [Thomas Carlyle] 5. The verification setting Page 66 We introduce linear

More information

For q 0; 1; : : : ; `? 1, we have m 0; 1; : : : ; q? 1. The set fh j(x) : j 0; 1; ; : : : ; `? 1g forms a basis for the tness functions dened on the i

For q 0; 1; : : : ; `? 1, we have m 0; 1; : : : ; q? 1. The set fh j(x) : j 0; 1; ; : : : ; `? 1g forms a basis for the tness functions dened on the i Comuting with Haar Functions Sami Khuri Deartment of Mathematics and Comuter Science San Jose State University One Washington Square San Jose, CA 9519-0103, USA khuri@juiter.sjsu.edu Fax: (40)94-500 Keywords:

More information

A Qualitative Event-based Approach to Multiple Fault Diagnosis in Continuous Systems using Structural Model Decomposition

A Qualitative Event-based Approach to Multiple Fault Diagnosis in Continuous Systems using Structural Model Decomposition A Qualitative Event-based Aroach to Multile Fault Diagnosis in Continuous Systems using Structural Model Decomosition Matthew J. Daigle a,,, Anibal Bregon b,, Xenofon Koutsoukos c, Gautam Biswas c, Belarmino

More information

The Poisson Regression Model

The Poisson Regression Model The Poisson Regression Model The Poisson regression model aims at modeling a counting variable Y, counting the number of times that a certain event occurs during a given time eriod. We observe a samle

More information

Topic 7: Using identity types

Topic 7: Using identity types Toic 7: Using identity tyes June 10, 2014 Now we would like to learn how to use identity tyes and how to do some actual mathematics with them. By now we have essentially introduced all inference rules

More information

Learning Sequence Motif Models Using Gibbs Sampling

Learning Sequence Motif Models Using Gibbs Sampling Learning Sequence Motif Models Using Gibbs Samling BMI/CS 776 www.biostat.wisc.edu/bmi776/ Sring 2018 Anthony Gitter gitter@biostat.wisc.edu These slides excluding third-arty material are licensed under

More information

8 STOCHASTIC PROCESSES

8 STOCHASTIC PROCESSES 8 STOCHASTIC PROCESSES The word stochastic is derived from the Greek στoχαστικoς, meaning to aim at a target. Stochastic rocesses involve state which changes in a random way. A Markov rocess is a articular

More information

John Weatherwax. Analysis of Parallel Depth First Search Algorithms

John Weatherwax. Analysis of Parallel Depth First Search Algorithms Sulementary Discussions and Solutions to Selected Problems in: Introduction to Parallel Comuting by Viin Kumar, Ananth Grama, Anshul Guta, & George Karyis John Weatherwax Chater 8 Analysis of Parallel

More information

CHAPTER 5 STATISTICAL INFERENCE. 1.0 Hypothesis Testing. 2.0 Decision Errors. 3.0 How a Hypothesis is Tested. 4.0 Test for Goodness of Fit

CHAPTER 5 STATISTICAL INFERENCE. 1.0 Hypothesis Testing. 2.0 Decision Errors. 3.0 How a Hypothesis is Tested. 4.0 Test for Goodness of Fit Chater 5 Statistical Inference 69 CHAPTER 5 STATISTICAL INFERENCE.0 Hyothesis Testing.0 Decision Errors 3.0 How a Hyothesis is Tested 4.0 Test for Goodness of Fit 5.0 Inferences about Two Means It ain't

More information

2. Sample representativeness. That means some type of probability/random sampling.

2. Sample representativeness. That means some type of probability/random sampling. 1 Neuendorf Cluster Analysis Assumes: 1. Actually, any level of measurement (nominal, ordinal, interval/ratio) is accetable for certain tyes of clustering. The tyical methods, though, require metric (I/R)

More information

Pretest (Optional) Use as an additional pacing tool to guide instruction. August 21

Pretest (Optional) Use as an additional pacing tool to guide instruction. August 21 Trimester 1 Pretest (Otional) Use as an additional acing tool to guide instruction. August 21 Beyond the Basic Facts In Trimester 1, Grade 8 focus on multilication. Daily Unit 1: Rational vs. Irrational

More information

On split sample and randomized confidence intervals for binomial proportions

On split sample and randomized confidence intervals for binomial proportions On slit samle and randomized confidence intervals for binomial roortions Måns Thulin Deartment of Mathematics, Usala University arxiv:1402.6536v1 [stat.me] 26 Feb 2014 Abstract Slit samle methods have

More information

Linear diophantine equations for discrete tomography

Linear diophantine equations for discrete tomography Journal of X-Ray Science and Technology 10 001 59 66 59 IOS Press Linear diohantine euations for discrete tomograhy Yangbo Ye a,gewang b and Jiehua Zhu a a Deartment of Mathematics, The University of Iowa,

More information

Introduction to Probability and Statistics

Introduction to Probability and Statistics Introduction to Probability and Statistics Chater 8 Ammar M. Sarhan, asarhan@mathstat.dal.ca Deartment of Mathematics and Statistics, Dalhousie University Fall Semester 28 Chater 8 Tests of Hyotheses Based

More information

Outline for today. Maximum likelihood estimation. Computation with multivariate normal distributions. Multivariate normal distribution

Outline for today. Maximum likelihood estimation. Computation with multivariate normal distributions. Multivariate normal distribution Outline for today Maximum likelihood estimation Rasmus Waageetersen Deartment of Mathematics Aalborg University Denmark October 30, 2007 the multivariate normal distribution linear and linear mixed models

More information

Pretest (Optional) Use as an additional pacing tool to guide instruction. August 21

Pretest (Optional) Use as an additional pacing tool to guide instruction. August 21 Trimester 1 Pretest (Otional) Use as an additional acing tool to guide instruction. August 21 Beyond the Basic Facts In Trimester 1, Grade 7 focus on multilication. Daily Unit 1: The Number System Part

More information

Convolutional Codes. Lecture 13. Figure 93: Encoder for rate 1/2 constraint length 3 convolutional code.

Convolutional Codes. Lecture 13. Figure 93: Encoder for rate 1/2 constraint length 3 convolutional code. Convolutional Codes Goals Lecture Be able to encode using a convolutional code Be able to decode a convolutional code received over a binary symmetric channel or an additive white Gaussian channel Convolutional

More information

7.2 Inference for comparing means of two populations where the samples are independent

7.2 Inference for comparing means of two populations where the samples are independent Objectives 7.2 Inference for comaring means of two oulations where the samles are indeendent Two-samle t significance test (we give three examles) Two-samle t confidence interval htt://onlinestatbook.com/2/tests_of_means/difference_means.ht

More information

Convex Optimization methods for Computing Channel Capacity

Convex Optimization methods for Computing Channel Capacity Convex Otimization methods for Comuting Channel Caacity Abhishek Sinha Laboratory for Information and Decision Systems (LIDS), MIT sinhaa@mit.edu May 15, 2014 We consider a classical comutational roblem

More information

arxiv:cond-mat/ v2 25 Sep 2002

arxiv:cond-mat/ v2 25 Sep 2002 Energy fluctuations at the multicritical oint in two-dimensional sin glasses arxiv:cond-mat/0207694 v2 25 Se 2002 1. Introduction Hidetoshi Nishimori, Cyril Falvo and Yukiyasu Ozeki Deartment of Physics,

More information

Research of PMU Optimal Placement in Power Systems

Research of PMU Optimal Placement in Power Systems Proceedings of the 5th WSEAS/IASME Int. Conf. on SYSTEMS THEORY and SCIENTIFIC COMPUTATION, Malta, Setember 15-17, 2005 (38-43) Research of PMU Otimal Placement in Power Systems TIAN-TIAN CAI, QIAN AI

More information

Variable Selection and Model Building

Variable Selection and Model Building LINEAR REGRESSION ANALYSIS MODULE XIII Lecture - 38 Variable Selection and Model Building Dr. Shalabh Deartment of Mathematics and Statistics Indian Institute of Technology Kanur Evaluation of subset regression

More information

Characterizing the Behavior of a Probabilistic CMOS Switch Through Analytical Models and Its Verification Through Simulations

Characterizing the Behavior of a Probabilistic CMOS Switch Through Analytical Models and Its Verification Through Simulations Characterizing the Behavior of a Probabilistic CMOS Switch Through Analytical Models and Its Verification Through Simulations PINAR KORKMAZ, BILGE E. S. AKGUL and KRISHNA V. PALEM Georgia Institute of

More information

Asymptotically Optimal Simulation Allocation under Dependent Sampling

Asymptotically Optimal Simulation Allocation under Dependent Sampling Asymtotically Otimal Simulation Allocation under Deendent Samling Xiaoing Xiong The Robert H. Smith School of Business, University of Maryland, College Park, MD 20742-1815, USA, xiaoingx@yahoo.com Sandee

More information

State Estimation with ARMarkov Models

State Estimation with ARMarkov Models Deartment of Mechanical and Aerosace Engineering Technical Reort No. 3046, October 1998. Princeton University, Princeton, NJ. State Estimation with ARMarkov Models Ryoung K. Lim 1 Columbia University,

More information

FE FORMULATIONS FOR PLASTICITY

FE FORMULATIONS FOR PLASTICITY G These slides are designed based on the book: Finite Elements in Plasticity Theory and Practice, D.R.J. Owen and E. Hinton, 1970, Pineridge Press Ltd., Swansea, UK. 1 Course Content: A INTRODUCTION AND

More information

Information collection on a graph

Information collection on a graph Information collection on a grah Ilya O. Ryzhov Warren Powell February 10, 2010 Abstract We derive a knowledge gradient olicy for an otimal learning roblem on a grah, in which we use sequential measurements

More information

System Reliability Estimation and Confidence Regions from Subsystem and Full System Tests

System Reliability Estimation and Confidence Regions from Subsystem and Full System Tests 009 American Control Conference Hyatt Regency Riverfront, St. Louis, MO, USA June 0-, 009 FrB4. System Reliability Estimation and Confidence Regions from Subsystem and Full System Tests James C. Sall Abstract

More information

Lecture 17: Trees and Merge Sort 10:00 AM, Oct 15, 2018

Lecture 17: Trees and Merge Sort 10:00 AM, Oct 15, 2018 CS17 Integrated Introduction to Computer Science Klein Contents Lecture 17: Trees and Merge Sort 10:00 AM, Oct 15, 2018 1 Tree definitions 1 2 Analysis of mergesort using a binary tree 1 3 Analysis of

More information

CSE 599d - Quantum Computing When Quantum Computers Fall Apart

CSE 599d - Quantum Computing When Quantum Computers Fall Apart CSE 599d - Quantum Comuting When Quantum Comuters Fall Aart Dave Bacon Deartment of Comuter Science & Engineering, University of Washington In this lecture we are going to begin discussing what haens to

More information

Supplementary Materials for Robust Estimation of the False Discovery Rate

Supplementary Materials for Robust Estimation of the False Discovery Rate Sulementary Materials for Robust Estimation of the False Discovery Rate Stan Pounds and Cheng Cheng This sulemental contains roofs regarding theoretical roerties of the roosed method (Section S1), rovides

More information

An Analysis of Reliable Classifiers through ROC Isometrics

An Analysis of Reliable Classifiers through ROC Isometrics An Analysis of Reliable Classifiers through ROC Isometrics Stijn Vanderlooy s.vanderlooy@cs.unimaas.nl Ida G. Srinkhuizen-Kuyer kuyer@cs.unimaas.nl Evgueni N. Smirnov smirnov@cs.unimaas.nl MICC-IKAT, Universiteit

More information

Multi-Operation Multi-Machine Scheduling

Multi-Operation Multi-Machine Scheduling Multi-Oeration Multi-Machine Scheduling Weizhen Mao he College of William and Mary, Williamsburg VA 3185, USA Abstract. In the multi-oeration scheduling that arises in industrial engineering, each job

More information

Outline. EECS150 - Digital Design Lecture 26 Error Correction Codes, Linear Feedback Shift Registers (LFSRs) Simple Error Detection Coding

Outline. EECS150 - Digital Design Lecture 26 Error Correction Codes, Linear Feedback Shift Registers (LFSRs) Simple Error Detection Coding Outline EECS150 - Digital Design Lecture 26 Error Correction Codes, Linear Feedback Shift Registers (LFSRs) Error detection using arity Hamming code for error detection/correction Linear Feedback Shift

More information

16.2. Infinite Series. Introduction. Prerequisites. Learning Outcomes

16.2. Infinite Series. Introduction. Prerequisites. Learning Outcomes Infinite Series 6.2 Introduction We extend the concet of a finite series, met in Section 6., to the situation in which the number of terms increase without bound. We define what is meant by an infinite

More information

A Bound on the Error of Cross Validation Using the Approximation and Estimation Rates, with Consequences for the Training-Test Split

A Bound on the Error of Cross Validation Using the Approximation and Estimation Rates, with Consequences for the Training-Test Split A Bound on the Error of Cross Validation Using the Aroximation and Estimation Rates, with Consequences for the Training-Test Slit Michael Kearns AT&T Bell Laboratories Murray Hill, NJ 7974 mkearns@research.att.com

More information

Yixi Shi. Jose Blanchet. IEOR Department Columbia University New York, NY 10027, USA. IEOR Department Columbia University New York, NY 10027, USA

Yixi Shi. Jose Blanchet. IEOR Department Columbia University New York, NY 10027, USA. IEOR Department Columbia University New York, NY 10027, USA Proceedings of the 2011 Winter Simulation Conference S. Jain, R. R. Creasey, J. Himmelsach, K. P. White, and M. Fu, eds. EFFICIENT RARE EVENT SIMULATION FOR HEAVY-TAILED SYSTEMS VIA CROSS ENTROPY Jose

More information

Distributed Rule-Based Inference in the Presence of Redundant Information

Distributed Rule-Based Inference in the Presence of Redundant Information istribution Statement : roved for ublic release; distribution is unlimited. istributed Rule-ased Inference in the Presence of Redundant Information June 8, 004 William J. Farrell III Lockheed Martin dvanced

More information

Bayesian inference & Markov chain Monte Carlo. Note 1: Many slides for this lecture were kindly provided by Paul Lewis and Mark Holder

Bayesian inference & Markov chain Monte Carlo. Note 1: Many slides for this lecture were kindly provided by Paul Lewis and Mark Holder Bayesian inference & Markov chain Monte Carlo Note 1: Many slides for this lecture were kindly rovided by Paul Lewis and Mark Holder Note 2: Paul Lewis has written nice software for demonstrating Markov

More information

Unit 1 - Computer Arithmetic

Unit 1 - Computer Arithmetic FIXD-POINT (FX) ARITHMTIC Unit 1 - Comuter Arithmetic INTGR NUMBRS n bit number: b n 1 b n 2 b 0 Decimal Value Range of values UNSIGND n 1 SIGND D = b i 2 i D = 2 n 1 b n 1 + b i 2 i n 2 i=0 i=0 [0, 2

More information

Hypothesis Test-Confidence Interval connection

Hypothesis Test-Confidence Interval connection Hyothesis Test-Confidence Interval connection Hyothesis tests for mean Tell whether observed data are consistent with μ = μ. More secifically An hyothesis test with significance level α will reject the

More information

p(-,i)+p(,i)+p(-,v)+p(i,v),v)+p(i,v)

p(-,i)+p(,i)+p(-,v)+p(i,v),v)+p(i,v) Multile Sequence Alignment Given: Set of sequences Score matrix Ga enalties Find: Alignment of sequences such that otimal score is achieved. Motivation Aligning rotein families Establish evolutionary relationshis

More information

Recent Developments in Multilayer Perceptron Neural Networks

Recent Developments in Multilayer Perceptron Neural Networks Recent Develoments in Multilayer Percetron eural etworks Walter H. Delashmit Lockheed Martin Missiles and Fire Control Dallas, Texas 75265 walter.delashmit@lmco.com walter.delashmit@verizon.net Michael

More information

Probability Estimates for Multi-class Classification by Pairwise Coupling

Probability Estimates for Multi-class Classification by Pairwise Coupling Probability Estimates for Multi-class Classification by Pairwise Couling Ting-Fan Wu Chih-Jen Lin Deartment of Comuter Science National Taiwan University Taiei 06, Taiwan Ruby C. Weng Deartment of Statistics

More information

CSC165H, Mathematical expression and reasoning for computer science week 12

CSC165H, Mathematical expression and reasoning for computer science week 12 CSC165H, Mathematical exression and reasoning for comuter science week 1 nd December 005 Gary Baumgartner and Danny Hea hea@cs.toronto.edu SF4306A 416-978-5899 htt//www.cs.toronto.edu/~hea/165/s005/index.shtml

More information

Chapter 7 Rational and Irrational Numbers

Chapter 7 Rational and Irrational Numbers Chater 7 Rational and Irrational Numbers In this chater we first review the real line model for numbers, as discussed in Chater 2 of seventh grade, by recalling how the integers and then the rational numbers

More information

MATHEMATICAL MODELLING OF THE WIRELESS COMMUNICATION NETWORK

MATHEMATICAL MODELLING OF THE WIRELESS COMMUNICATION NETWORK Comuter Modelling and ew Technologies, 5, Vol.9, o., 3-39 Transort and Telecommunication Institute, Lomonosov, LV-9, Riga, Latvia MATHEMATICAL MODELLIG OF THE WIRELESS COMMUICATIO ETWORK M. KOPEETSK Deartment

More information

PHYS 301 HOMEWORK #9-- SOLUTIONS

PHYS 301 HOMEWORK #9-- SOLUTIONS PHYS 0 HOMEWORK #9-- SOLUTIONS. We are asked to use Dirichlet' s theorem to determine the value of f (x) as defined below at x = 0, ± /, ± f(x) = 0, - < x

More information

RANDOM WALKS AND PERCOLATION: AN ANALYSIS OF CURRENT RESEARCH ON MODELING NATURAL PROCESSES

RANDOM WALKS AND PERCOLATION: AN ANALYSIS OF CURRENT RESEARCH ON MODELING NATURAL PROCESSES RANDOM WALKS AND PERCOLATION: AN ANALYSIS OF CURRENT RESEARCH ON MODELING NATURAL PROCESSES AARON ZWIEBACH Abstract. In this aer we will analyze research that has been recently done in the field of discrete

More information

One-way ANOVA Inference for one-way ANOVA

One-way ANOVA Inference for one-way ANOVA One-way ANOVA Inference for one-way ANOVA IPS Chater 12.1 2009 W.H. Freeman and Comany Objectives (IPS Chater 12.1) Inference for one-way ANOVA Comaring means The two-samle t statistic An overview of ANOVA

More information

Brownian Motion and Random Prime Factorization

Brownian Motion and Random Prime Factorization Brownian Motion and Random Prime Factorization Kendrick Tang June 4, 202 Contents Introduction 2 2 Brownian Motion 2 2. Develoing Brownian Motion.................... 2 2.. Measure Saces and Borel Sigma-Algebras.........

More information

An Improved Calibration Method for a Chopped Pyrgeometer

An Improved Calibration Method for a Chopped Pyrgeometer 96 JOURNAL OF ATMOSPHERIC AND OCEANIC TECHNOLOGY VOLUME 17 An Imroved Calibration Method for a Choed Pyrgeometer FRIEDRICH FERGG OtoLab, Ingenieurbüro, Munich, Germany PETER WENDLING Deutsches Forschungszentrum

More information

A New Perspective on Learning Linear Separators with Large L q L p Margins

A New Perspective on Learning Linear Separators with Large L q L p Margins A New Persective on Learning Linear Searators with Large L q L Margins Maria-Florina Balcan Georgia Institute of Technology Christoher Berlind Georgia Institute of Technology Abstract We give theoretical

More information

Numerical Linear Algebra

Numerical Linear Algebra Numerical Linear Algebra Numerous alications in statistics, articularly in the fitting of linear models. Notation and conventions: Elements of a matrix A are denoted by a ij, where i indexes the rows and

More information

Morten Frydenberg Section for Biostatistics Version :Friday, 05 September 2014

Morten Frydenberg Section for Biostatistics Version :Friday, 05 September 2014 Morten Frydenberg Section for Biostatistics Version :Friday, 05 Setember 204 All models are aroximations! The best model does not exist! Comlicated models needs a lot of data. lower your ambitions or get

More information

Finite-State Verification or Model Checking. Finite State Verification (FSV) or Model Checking

Finite-State Verification or Model Checking. Finite State Verification (FSV) or Model Checking Finite-State Verification or Model Checking Finite State Verification (FSV) or Model Checking Holds the romise of roviding a cost effective way of verifying imortant roerties about a system Not all faults

More information

Hidden Predictors: A Factor Analysis Primer

Hidden Predictors: A Factor Analysis Primer Hidden Predictors: A Factor Analysis Primer Ryan C Sanchez Western Washington University Factor Analysis is a owerful statistical method in the modern research sychologist s toolbag When used roerly, factor

More information

Maximum Entropy and the Stress Distribution in Soft Disk Packings Above Jamming

Maximum Entropy and the Stress Distribution in Soft Disk Packings Above Jamming Maximum Entroy and the Stress Distribution in Soft Disk Packings Above Jamming Yegang Wu and S. Teitel Deartment of Physics and Astronomy, University of ochester, ochester, New York 467, USA (Dated: August

More information

Section 0.10: Complex Numbers from Precalculus Prerequisites a.k.a. Chapter 0 by Carl Stitz, PhD, and Jeff Zeager, PhD, is available under a Creative

Section 0.10: Complex Numbers from Precalculus Prerequisites a.k.a. Chapter 0 by Carl Stitz, PhD, and Jeff Zeager, PhD, is available under a Creative Section 0.0: Comlex Numbers from Precalculus Prerequisites a.k.a. Chater 0 by Carl Stitz, PhD, and Jeff Zeager, PhD, is available under a Creative Commons Attribution-NonCommercial-ShareAlike.0 license.

More information