EECS E6870: Lecture 6: Pronunciation Modeling and Decision Trees
|
|
- Janis Barker
- 6 years ago
- Views:
Transcription
1 EECS E6870: Lecture 6: Pronunciation Modeling and Decision Trees Stanley F. Chen, Michael A. Picheny and Bhuvana Ramabhadran IBM T. J. Watson Research Center Yorktown Heights, NY October 3, 009 EECS E6870: Seech Recognition
2 In the beginning. was the whole word model For each word in the vocabulary, decide on a toology Often the number of states in the model is chosen to be roortional to the number of honemes in the word Train the observation and transition arameters for a given word using examles of that word in the training data (Recall roblem 3 associated with Hidden Markov Models) Good domain for this aroach: digits Seech Recognition Lecture 6: Pronunciation Modeling and Decision Trees
3 Examle toologies: Digits Vocabulary consists of { zero, oh, one, two, three, four, five, six, seven, eight, nine } Assume we assign two states er honeme Must allow for different durations use self loos and ski arcs Models look like: zero oh 9 0 Seech Recognition Lecture 6: Pronunciation Modeling and Decision Trees 3
4 one two three four five Seech Recognition Lecture 6: Pronunciation Modeling and Decision Trees 4
5 six seven eight nine Seech Recognition Lecture 6: Pronunciation Modeling and Decision Trees 5
6 How to reresent EECS E6870 any sequence of digits? Seech Recognition Lecture 6: Pronunciation Modeling and Decision Trees 6
7 9 EECS E Seech Recognition Lecture 6: Pronunciation Modeling and Decision Trees 7
8 State: EECS E6870 Trellis Reresentation Time: Seech Recognition Lecture 6: Pronunciation Modeling and Decision Trees 8
9 Whole-word model limitations The whole-word model suffers from two main roblems. Cannot model unseen words. In fact, we need several samles of each word to train the models roerly. Cannot share data among models data sarseness roblem.. The number of arameters in the system is roortional to the vocabulary size. Thus, whole-word models are best on small vocabulary tasks. Seech Recognition Lecture 6: Pronunciation Modeling and Decision Trees 9
10 Subword Units To reduce the number of arameters, we can comose word models from sub-word units. These units can be shared among words. Examles include: Units Aroximate number Phones 50 Dihones 000 Trihones 0,000 Syllables 5,000 Each unit is small The number of arameters is roortional to the number of units (not the number of words in the vocabulary as in whole-word models.) Seech Recognition Lecture 6: Pronunciation Modeling and Decision Trees 0
11 Phonetic Models We reresent each word as a sequence of honemes. This reresentation is the baseform for the word. BANDS B AE N D Z Some words need more than one baseform THE DH UH DH IY Seech Recognition Lecture 6: Pronunciation Modeling and Decision Trees
12 Baseform Dictionary To determine the ronunciation of each word, we look it u in a dictionary Each word may have several ossible ronunciations Every word in our training scrit and test vocabulary must be in the dictionary The dictionary is generally written by hand Prone to errors and inconsistencies nd looks wrong AA K.. is missing Shouldn t the choices For the st honeme be the Same for these words? Don t these words All start the same? acaulco acaulco accelerator accelerator acceleration acceleration accent accet accetable access accessory accessory AE K AX P AH L K OW AE K AX P UH K OW AX K S EH L AX R EY DX ER IX K S EH L AX R EY DX ER AX K S EH L AX R EY SH IX N AE K S EH L AX R EY SH IX N AE K S EH N T AX K S EH P T AX K S EH P T AX B AX L AE K S EH S AX K S EH S AX R IY EH K S EH S AX R IY Seech Recognition Lecture 6: Pronunciation Modeling and Decision Trees
13 Phonetic Models, cont d We can allow for honological variation by reresenting baseforms as grahs acaulco acaulco AE K AX P AH L K OW AA K AX P UH K OW AE K AX P AH L K OW AA UH Seech Recognition Lecture 6: Pronunciation Modeling and Decision Trees 3
14 Phonetic Models, cont d Now, construct a Markov model for each hone. Examles: (length) length Seech Recognition Lecture 6: Pronunciation Modeling and Decision Trees 4
15 Embedding Relace each hone by its Markov model to get a word model n.b. The model for each hone will have different arameter values AE K AX P AH L K OW AA UH Seech Recognition Lecture 6: Pronunciation Modeling and Decision Trees 5
16 Reducing Parameters by Tying Consider the three-state model t t t 3 t 4 t 5 t 6 Note that t and t corresond to the beginning of the hone t 3 and t 4 corresond to the middle of the hone t 5 and t 6 corresond to the end of the hone If we force the outut distributions for each member of those airs to be the same, then the training data requirements are reduced. Seech Recognition Lecture 6: Pronunciation Modeling and Decision Trees 6
17 Tying A set of arcs in a Markov model are tied to one another if they are constrained to have identical outut distributions. Similarly, states are tied if they have identical transition robabilities. Tying can be exlicit or imlicit. Seech Recognition Lecture 6: Pronunciation Modeling and Decision Trees 7
18 Imlicit Tying Occurs when we build u models for larger units from models of smaller units Examle: when word models are made from hone models First, consider an examle without any tying. Let the vocabulary consist of digits 0,,, 9 We can make a searate model for each word. To estimate arameters for each word model, we need several samles for each word. Samles of 0 affect only arameters for the 0 model. Seech Recognition Lecture 6: Pronunciation Modeling and Decision Trees 8
19 Imlicit Tying, cont d Now consider hone-based models for this vocabulary 0 Z IY R OW W AA N T UW 3 TH R IY 4 F AO R 5 F AY V 6 S IH K S 7 S EH V AX N 8 EY T 9 N AY N Training samles of 0 will also affect models for 3 and 4 Useful in large vocabulary systems where the number of words is much greater than the number of hones. Seech Recognition Lecture 6: Pronunciation Modeling and Decision Trees 9
20 Exlicit Tying Examle: b b m m e e 6 non-null arcs, but only 3 different outut distributions because of tying Number of model arameters is reduced Tying saves storage because only one coy of each distribution is saved Fewer arameters mean less training data needed Seech Recognition Lecture 6: Pronunciation Modeling and Decision Trees 0
21 Variations in realizations of honemes The broad units, honemes, have variants known as allohones Examle: In English and h (un-asirated and asirated ) Exercise: Put your hand in front of your mouth and ronounce sin and then in. Note that the in in has a uff of air, while the in sin does not. Articulators have inertia, thus the ronunciation of a honeme is influenced by surrounding honemes. This is known as co-articulation Examle: Consider k and g in different contexts In key and geese the whole body of the tongue has to be ulled u to make the vowel Closure of the k moves forward comared to caw and gauze Phonemes have canonical articulator target ositions that may or may not be reached in a articular utterance. Seech Recognition Lecture 6: Pronunciation Modeling and Decision Trees
22 kee Seech Recognition Lecture 6: Pronunciation Modeling and Decision Trees
23 coo Seech Recognition Lecture 6: Pronunciation Modeling and Decision Trees 3
24 Context-deendent models We can model hones in context Two aroaches: trihones and leaves Both methods use clustering. Trihones use bottom-u clustering, leaves use to-down clustering Tyical imrovements of seech recognizers when introducing context deendence: 30% - 50% fewer errors. Seech Recognition Lecture 6: Pronunciation Modeling and Decision Trees 4
25 Tri-hone models Model each honeme in the context of its left and right neighbor e.g. K-IY+P is a model for IY when K is its left context honeme and P is its right context honeme If we have 50 honemes in a language, we could have as many as 50 3 trihones to model Not all of these occur Still have data sarsity issues Try to solve these issues by agglomerative clustering Seech Recognition Lecture 6: Pronunciation Modeling and Decision Trees 5
26 Agglomerative / Bottom-u Clustering Start with each item in a cluster by itself Find closest air of items Merge them into a single cluster Iterate Different results based on distance measure used Single-link: dist(a,b) = min dist(a,b) for a A, b B Comlete-link: dist(a,b) = max dist(a,b) for a A, b B Seech Recognition Lecture 6: Pronunciation Modeling and Decision Trees 6
27 Bottom-u clustering / Single Link Assume our data oints look like: Single-link: clusters are close if any of their oints are: dist(a,b) = min dist(a,b) for a A, b B Single-link clustering into grous roceeds as: Single-link clustering tends to yield long, stringy, meandering clusters Seech Recognition Lecture 6: Pronunciation Modeling and Decision Trees 7
28 Bottom-u clustering / Comlete Link Again, assume our data oints look like: Comlete-link: clusters are close only if ALL of their oints are: dist(a,b) = max dist(a,b) for a A, b B Single-link clustering into grous roceeds as: Comlete-link clustering tends to yield roundish clusters Seech Recognition Lecture 6: Pronunciation Modeling and Decision Trees 8
29 Dendogram A natural way to dislay clusters is through a dendogram Shows the clusters on the x-axis, distance between clusters on the y-axis Provides some guidance as to a good choice for the number of clusters x x x 3 x 4 x 5 x 6 x 7 x 8 distance Seech Recognition Lecture 6: Pronunciation Modeling and Decision Trees 9
30 Trihone Clustering We can use e.g. comlete-link clustering to cluster trihones Hels with data sarsity issue Still have an issue with unseen data To model unseen events, we need to back-off to lower order models such as bi-hones and uni-hones Seech Recognition Lecture 6: Pronunciation Modeling and Decision Trees 30
31 Allohones Pronunciation variations of honemes Less rigidly defined than trihones In our seech recognition system, we tyically have about 50 allohones er honeme. Older techniques ( tri-hones ) tried to enumerate by hand the set of allohones for each honeme. We currently use automatic methods to identify the allohones of a honeme. Boils down to conditional modeling. Seech Recognition Lecture 6: Pronunciation Modeling and Decision Trees 3
32 Conditional Modeling In seech recognition, we use conditional models and conditional statistics Acoustic model Conditioned on the hone context Selling-to-sound rules Describe the ronunciation of a letter in terms of a robability distribution that deends on neighboring letters and their ronunciations. The distribution is conditioned on the letter context and ronunciation context. Language model Probability distribution for the next word is conditioned on the receding words One way to define the set of conditions is a class of statistical models known as decision trees Classic text: L. Breiman et al. Classification and Regression Trees. Wadsworth & Brooks. Monterey, California Seech Recognition Lecture 6: Pronunciation Modeling and Decision Trees 3
33 Decision Trees We would like to find equivalence classes among our training samles DTs categorize data into multile disjoint categories The urose of a decision tree is to ma conditions (such as honetic contexts) into equivalence classes The goal in constructing a decision tree is to create good equivalence classes DTs are examles of divisive / to-down clustering Each internal node secifies an attribute that an object must be tested on Each leaf node reresents a classification Seech Recognition Lecture 6: Pronunciation Modeling and Decision Trees 33
34 What does a decision tree look like? Seech Recognition Lecture 6: Pronunciation Modeling and Decision Trees 34
35 Tyes of Attributes/Features Numerical: Domain is ordered and can be reresented on the real line (e.g., age, income) Nominal or categorical: Domain is a finite set without any natural ordering (e.g., occuation, marital status, race) Ordinal: Domain is ordered, but absolute differences between values is unknown (e.g., reference scale, severity of an injury) Seech Recognition Lecture 6: Pronunciation Modeling and Decision Trees 35
36 The Classification Problem If the deendent variable is categorical, the roblem is a classification roblem Let C be the class label of a given data oint X=X,, X k Let d( ) be the redicted class label Define the misclassification rate of d: Prob (d(x,, X k )!= C Problem definition: Given a dataset, find the classifier d such that the misclassification rate is minimized. Seech Recognition Lecture 6: Pronunciation Modeling and Decision Trees 36
37 The Regression Problem If the deendent variable is numerical, the roblem is a regression roblem. The tree d mas observation X to rediction Y of Y and is called a regression function. Define mean squared error rate of d as: E[ (Y - d(x,, X k )) ] Problem definition: Given dataset, find regression function d such that mean squared error is minimized. Seech Recognition Lecture 6: Pronunciation Modeling and Decision Trees 37
38 Goals & Requirements Goals: To roduce an accurate classifier/regression function To understand the structure of the roblem Requirements on the model: High accuracy Understandable by humans, interretable Fast construction for very large training databases Seech Recognition Lecture 6: Pronunciation Modeling and Decision Trees 38
39 Decision Trees: Letter-to-Sound Examle Let s say we want to build a tree to decide how the letter will sound in various words Training examles: loohole eanuts ay ale f hysics telehone grah hoto φ ale sycho terodactyl neumonia The ronunciation of deends on its context. Task: Using the above training data, artition the contexts into equivalence classes so as to minimize the uncertainty of the ronunciation. Seech Recognition Lecture 6: Pronunciation Modeling and Decision Trees 39
40 Decision Trees: Letter-to-Sound Examle, cont d Denote the context as L L R R R = h? Y N loohole f hysics telehone grah hoto eanut ay ale φ ale sycho terodactyl neumonia At this oint we have two equivalence classes:. R = h and. R g h The ronunciation of class is either or f, with f much more likely than. The ronunciation of class is either or φ. Seech Recognition Lecture 6: Pronunciation Modeling and Decision Trees 40
41 R = h? Y N loohole f hysics telehone grah hoto eanut ay ale φ ale sycho terodactyl neumonia L = o? R = consonant? Y loohole N f hysics telehone grah hoto 3 Y φ ale ale sycho terodactyl neumonia N 4 eanut ay Four equivalence classes. Uncertainty remains only in class 3. Seech Recognition Lecture 6: Pronunciation Modeling and Decision Trees 4
42 R = h? Y N loohole f hysics telehone grah hoto eanut ay ale φ ale sycho terodactyl neumonia L = o? loohole Five equivalence classes, which is much less than the number of letter contexts. No uncertainty left in the classes. f hysics telehone grah hoto A node without children (an equivalence class) is called a leaf node. Otherwise it is called an internal node. Y N Y N ale L = a? φ ale sycho terodactyl neumonia Y N 3 φ ale 4 ale sycho terodactyl neumonia R = consonant? eanut ay Seech Recognition Lecture 6: Pronunciation Modeling and Decision Trees 4 5
43 Consider test case: Paris R = h? Y N loohole f hysics telehone grah hoto eanut ay ale φ ale sycho terodactyl neumonia L = o? loohole Y N Y N f hysics telehone grah hoto ale φ ale sycho terodactyl neumonia R = consonant? 5 eanut ay Correct L = a? Y N 3 φ ale 4 ale sycho terodactyl neumonia Seech Recognition Lecture 6: Pronunciation Modeling and Decision Trees 43
44 Consider test case : goher R = h? Y N loohole f hysics telehone grah hoto eanut ay ale φ ale sycho terodactyl neumonia L = o? loohole Wrong Y N Y N f hysics telehone grah hoto ale φ ale sycho terodactyl neumonia R = consonant? 5 eanut ay Although effective on the training data, this tree does not generalize well. It was constructed from too little data. L = a? Y N 3 φ ale 4 ale sycho terodactyl neumonia Seech Recognition Lecture 6: Pronunciation Modeling and Decision Trees 44
45 Decision Tree Construction. Find the best question for artitioning the data at a given node into equivalence classes.. Reeat ste recursively on each child node. 3. Sto when there is insufficient data to continue or when the best question is not sufficiently helful. Seech Recognition Lecture 6: Pronunciation Modeling and Decision Trees 45
46 Basic Issues to Solve The selection of the slits The decisions when to declare a node terminal or to continue slitting The assignment of each node to a class Seech Recognition Lecture 6: Pronunciation Modeling and Decision Trees 46
47 Decision Tree Construction Fundamental Oeration There is only fundamental oeration in tree construction: Find the best question for artitioning a subset of the data into two smaller subsets. i.e. Take an equivalence class and slit it into more-secific classes. Seech Recognition Lecture 6: Pronunciation Modeling and Decision Trees 47
48 Decision Tree Greediness Tree construction roceeds from the to down from root to leaf. Each slit is intended to be locally otimal. Constructing a tree in this greedy fashion usually leads to a good tree, but robably not globally otimal. Finding the globally otimal tree is an NP-comlete roblem: it is not ractical. Seech Recognition Lecture 6: Pronunciation Modeling and Decision Trees 48
49 Slitting Each internal node has an associated slitting question. Examle questions: Age <= 0 (numeric) Profession in {student, teacher} (categorical) 5000*Age + 3*Salary 0000 > 0 (function of raw features) Seech Recognition Lecture 6: Pronunciation Modeling and Decision Trees 49
50 Dynamic Questions The best question to ask about some discrete variable x consists of the best subset of the values taken by x. Search over all subsets of values taken by x at a given node. (This is generating questions on the fly during tree construction.) xc{a,b,c} Q: xc{a}? Q: xc{b}? Q3: xc{c}? Q4: xc{a,b}? Q5: xc{a,c}? Q6: xc{b,c}? Q7: xc{a,b,c}? Use the best question found. Potential roblems:. Requires a lot of CPU. For alhabet size A there are questions. j. Allows a lot of freedom, making it easy to overtrain. j A Seech Recognition Lecture 6: Pronunciation Modeling and Decision Trees 50
51 Pre-determined Questions The easiest way to construct a decision tree is to create in advance a list of ossible questions for each variable. Finding the best question at any given node consists of subjecting all relevant variables to each of the questions, and icking the best combination of variable and question. In acoustic modeling, we tyically ask about 0 variables: the 5 hones to the left of the current hone and the 5 hones to the right of the current hone. Since these variables all san the same alhabet (hone alhabet) only one list of questions. Each question on this list consists of a subset of the honetic hone alhabet. Seech Recognition Lecture 6: Pronunciation Modeling and Decision Trees 5
52 Samle Questions Phones Letters {P} {A} {T} {E} {K} {I} {B} {O} {D} {U} {G} {Y} {P,T,K} {A,E,I,O,U} {B,D,G} {A,E,I,O,U,Y} {P,T,K,B,D,G} Seech Recognition Lecture 6: Pronunciation Modeling and Decision Trees 5
53 Discrete Questions A decision tree has a question associated with every non-terminal node. If x is a discrete variable which takes on values in some finite alhabet Α, then a question about x has the form: xcs? where S is a subset of Α. Let L denote the receding letter in building a selling-to-sound tree. Let S={A,E,I,O,U}. Then LcS? denotes the question: Is the receding letter a vowel? Let R denote the following hone in building an acoustic context tree. Let S={P,T,K}. Then RcS? denotes the question: Is the following hone an unvoiced sto? Seech Recognition Lecture 6: Pronunciation Modeling and Decision Trees 53
54 Continuous Questions If x is a continuous variable which takes on real values, a question about x has the form x<θ? where θ is some real value. In order to find the threshold θ, we must try values which searate all training samles. θ θ θ 3 x We do not currently use continuous questions for seech recognition. Seech Recognition Lecture 6: Pronunciation Modeling and Decision Trees 54
55 Tyes of Questions In rincile, a question asked in a decision tree can have any number (greater than ) of ossible outcomes. Examles: Binary: Yes No 3 Outcomes: Yes No Don t_know 6 Outcomes: A B C Z In ractice, only binary questions are used to build decision trees. Seech Recognition Lecture 6: Pronunciation Modeling and Decision Trees 55
56 Simle Binary Question A simle binary question consists of a single Boolean condition, and no Boolean oerators. X c S? Is a simle question. (( X c S ) && ( X c S ))? Is not a simle question. Toologically, a simle question looks like: xcs? Y N Seech Recognition Lecture 6: Pronunciation Modeling and Decision Trees 56
57 Comlex Binary Question A comlex binary question has recisely outcomes (yes, no) but has more than Boolean condition and at least Boolean oerator. (( X c S ) && ( X c S ))? Is a comlex question. Toologically this question can be shown as: x cs? x cs? Y N Outcomes: ( X c S ) 3 ( X c S ) Y N ( X c S ) 3 ( X c S ) =( X c S ) 4 ( X c S ) All comlex questions can be reresented as binary trees with terminal nodes tied to roduce outcomes. Seech Recognition Lecture 6: Pronunciation Modeling and Decision Trees 57
58 Configurations Currently Used All decision trees currently used in seech recognition use: a re-determined set of simle, binary questions on discrete variables. Seech Recognition Lecture 6: Pronunciation Modeling and Decision Trees 58
59 Tree Construction Overview Let x x n denote n discrete variables whose values may be asked about. Let Q ij denote the j th re-determined question for x i. Starting at the root, try slitting each node into sub-nodes:. For each variable x i evaluate questions Q i, Q i, and let Q i denote the best.. Find the best air x i,q i and denote it x,q. 3. If Q is not sufficiently helful, make the current node a leaf. 4. Otherwise, slit the current node into new sub-nodes according to the answer of question Q on variable x. Sto when all nodes are either too small to slit further or have been marked as leaves. Seech Recognition Lecture 6: Pronunciation Modeling and Decision Trees 59
60 Question Evaluation The best question at a node is the question which maximizes the likelihood of the training data at that node after alying the question. Parent node: Model as a single Gaussian N(µ,Σ ) Comute likelihood L(data µ,σ ) Q? Y N Left child node: Model as a single Gaussian N(µ l,σ l ) Comute likelihood L(data l µ l,σ l ) Right child node: Model as a single Gaussian N(µ r,σ r ) Comute likelihood L(data r µ r,σ r ) Goal: Find Q such that L(data l µ l,σ l ) x L(data r µ r,σ r ) is maximized. Seech Recognition Lecture 6: Pronunciation Modeling and Decision Trees 60
61 Question Evaluation, cont d Let x,x, x N be a samle of feature x, in which outcome a i occurs c i times. Let Q be a question which artitions this samle into left and right subsamles of size n l and n r, resectively. Let c il, c i r denote the frequency of a i in the left and right sub-samles. The best question Q for feature x is the one which maximizes the conditional likelihood of the samle given Q, or, equivalently, maximizes the conditional log likelihood of the samle. Seech Recognition Lecture 6: Pronunciation Modeling and Decision Trees 6
62 Seech Recognition Lecture 6: Pronunciation Modeling and Decision Trees 6 log likelihood comutation The log likelihood of the data, given that we ask question Q, is: Using the maximum likelihood estimates of il, ir gives: = = + = N i N i r i r i l i l i n c c Q x x L log log ).. ( log r r l l r i r i N i l i l i N i r i r N i N i r i r i N i l i l l i l i r N i N i r i r i l l i l i n n n n n c c c c c n c c c n c c n c c n c c Q x x L log log } log log { log log log log ) / log( ) / log( ).. ( log + = + = + = = = = = = = = The best question is the one which maximizes this simle exression. c il,c ir,n l,and n r are all non-negative integers. The above exression can be comuted very efficiently using a re-comuted table of n log n for non-negative integers n.
63 Entroy EECS E6870 Entroy is a measure of uncertainty, measured in bits. Let x be a discrete random variable taking values a..a N in an alhabet A of size N with robabilities N resectively. The uncertainty about what value x will take can be measured by the entroy of the robability distribution =( N ). H H N = = 0 i= i log i = for some j i H>=0. Entroy is maximized when i =/N for all i. Then H=log N. Thus H tells us something about the sharness of the distribution. j and = 0 for i j H can be interreted as the theoretical minimum average number of bits that are required to encode/transmit the distribution. Seech Recognition Lecture 6: Pronunciation Modeling and Decision Trees 63
64 What does entroy look like for a binary variable? Seech Recognition Lecture 6: Pronunciation Modeling and Decision Trees 64
65 Entroy and Likelihood Let x be a discrete random variable taking values a..a N in an alhabet A of size N with robabilities N resectively. Let x.. x n be a samle of x in which a i occurs c i times, i=..n. The samle likelihood is: The maximum likelihood estimate of i is N L = i i= c i ˆ i = c i n Thus, an estimate of the samle log likelihood is: and log Lˆ = log n N i= c log ˆ i Lˆ = N i= i ˆ log i ˆ i = Hˆ Since n is a constant, and log is a monotonic function, H is monotonically related to L. Therefore, maximizing likelihood minimizing entroy. Seech Recognition Lecture 6: Pronunciation Modeling and Decision Trees 65
66 tree, revisited : eanut, ay, loohole, ale c = 4 f : hysics, hoto, grah, telehone c f = 4 φ : ale, sycho, terodactyl, neumonia c φ = 4 n= Log likelihood of data at the root node is: log = L( x 4log.. x ) 4 + 4log 3 = c i= i log 4 + 4log c i nlog 4 log n = 9.0 Average entroy at the root node is: H ( x.. x ) = log L( x.. x n = 9.0 / =.58 bits ) Seech Recognition Lecture 6: Pronunciation Modeling and Decision Trees 66
67 tree revisited: Question A R = h? Y N loohole f hysics telehone grah hoto eanut ay ale φ ale sycho terodactyl neumonia n l =5 c l = c fl =4 c φl =0 n r =7 c r =3 c fr =0 c φr =4 Seech Recognition Lecture 6: Pronunciation Modeling and Decision Trees 67
68 tree revisited: Question A Log likelihood of data after alying question A is: n l =5 c l = c fl =4 c φl =0 n r =7 c r =3 c fr =0 c φr =4 log L( x.. x Q A ) = 4log 4 5log 5 + 3log 3+ 4log 4 7log 7 = 0.5 Average entroy of data after alying question A is: H ( x.. x Q A ) = log L( x.. x QA) = 0.5/ = 0.87 bits Increase in log likelihood due to question A is =8.5 Decrease in entroy due to question A is =0.7 bits Knowing the answer to question A rovides 0.7 bits of information about the ronunciation of. A further 0.87 bits of information is still required to remove all the uncertainty about the ronunciation of. Seech Recognition Lecture 6: Pronunciation Modeling and Decision Trees 68
69 tree revisited: Question B L = φ? Y N eanut ay f hysics hoto φ sycho terodactyl neumonia loohole ale f telehone grah φ ale n l =7 c l = c fl = c φl =3 n r =5 c r = c fr = c φr = Seech Recognition Lecture 6: Pronunciation Modeling and Decision Trees 69
70 tree revisited: Question B n l =7 c l = c fl = c φl =3 n r =5 c r = c fr = c φr = Log likelihood of data after alying question B is: log log L( x.. x + log Q B log 5log + log 5 = 8.5 Average entroy of data after alying question B is: H ( x.. x Q B ) = + 3log 3 7log Increase in log likelihood due to question B is =0.5 ) = log L( x.. x QB ) = 8.5/ =.54 bits 7 + Decrease in entroy due to question B is =0.04 bits Knowing the answer to question A rovides 0.04 bits of information (very little) about the ronunciation of. Seech Recognition Lecture 6: Pronunciation Modeling and Decision Trees 70
71 tree revisited: Question C L = vowel? Y N loohole ale f telehone grah eanut ay f hysics hoto φ ale sycho terodactyl neumonia n l =4 c l = c fl = c φl =0 n r =8 c r = c fr = c φr =4 Seech Recognition Lecture 6: Pronunciation Modeling and Decision Trees 7
72 tree revisited: Question C n l =4 c l = c fl = c φl =0 n r =8 c r = c fr = c φr =4 Log likelihood of data after alying question C is: log L( x.. x + log Q C + log log + 4log + log 4 8log 4log Average entroy of data after alying question C is: H ( x.. x Q C ) = 8 = 6.00 Increase in log likelihood due to question C is =3.0 Decrease in entroy due to question C is =0.5 bits Knowing the answer to question C rovides 0.5 bits of information about the ronunciation of. ) = log L( x.. x QC ) = 6 / =.33 bits 4 Seech Recognition Lecture 6: Pronunciation Modeling and Decision Trees 7
73 Comarison of Questions A, B, C Log likelihood of data given question: A -0.5 B -8.5 C Average entroy (bits) of data given question: A 0.87 B.54 C.33 Gain in information (in bits) due to question: A 0.7 B 0.04 C 0.5 These measures all say the same thing: Question A is best. Question C is nd best. Question B is worst. Seech Recognition Lecture 6: Pronunciation Modeling and Decision Trees 73
74 Best Question In general, we seek questions which maximize the likelihood of the training data given some model. The exression to be maximized deends on the model. In the revious examle the model is a multinomial distribution. Another examle: context-deendent rototyes use a continuous distribution. Seech Recognition Lecture 6: Pronunciation Modeling and Decision Trees 74
75 Best Question: Context-Deendent Prototyes For context-deendent rototyes, we use a decision tree for each arc in the Markov model. At each leaf of the tree is a continuous distribution which serves as a context-deendent model of the acoustic vectors associated with the arc. The distributions and the tree are created from acoustic feature vectors aligned with the arc. We grow the tree so as to maximize the likelihood of the training data (as always), but now the training data are real-valued vectors. We estimate the likelihood of the acoustic vectors during tree construction using a diagonal Gaussian model. When tree construction is comlete, we relace the diagonal Gaussian models at the leaves by more accurate models to serve as the final rototyes. Tyically we use mixtures of diagonal Gaussians. Seech Recognition Lecture 6: Pronunciation Modeling and Decision Trees 75
76 Diagonal Gaussian Likelihood Let Y=y y y n be a samle of indeendent -dimensional acoustic vectors arising from a diagonal Gaussian distribution with mean µ and variances σ. Then log L( Y DG( µ, σ )) = The maximum likelihood estimates of µ and σ are: uˆ ˆ σ j j n n Hence, an estimate of log L(Y) is log L( Y = = n i= n i= y ij y ij DG( ˆ, µ ˆ σ )) = ˆ µ n { log(π ) + i= j= j j =,,... n j =,,... { log(π ) + i= j= logσ + log ˆ σ + j j j= j= ( y ij ( y µ ) σ ij j ˆ σ j ˆ µ ) j j } } Seech Recognition Lecture 6: Pronunciation Modeling and Decision Trees 76
77 Seech Recognition Lecture 6: Pronunciation Modeling and Decision Trees 77 Diagonal Gaussian Likelihood Now, Hence j n n y j n y y y j j n i ij j n i ij j n i ij j n i ij,,... ˆ ˆ,,.. ˆ ˆ ) ˆ ( = = = = + = = = = = σ µ µ µ µ } ˆ log ) log( { } ˆ log ) log( { )) ˆ ˆ, ( ( log n n n n DG Y L j j j n i j j n i + + = + + = = = = = = σ π σ π σ µ
78 Diagonal Gaussian Slits Let Q be a question which artitions Y into left and right sub-samles Y l and Y r, of size n l and n r. The best question is the one which maximizes logl(y l )+logl(y r ) Using a diagonal Gaussian model Common to all slits + n log L( Y = = { nl r log(π ) + n l DG( ˆ µ, ˆ σ )) + log L( Y log(π ) + n j= j= { n log(π ) + n} l r l l log ˆ σ + n log ˆ σ + n rj { nl lj r r j= DG( ˆ µ, ˆ σ )) } l log ˆ σ + n lj r r r j= log ˆ σ } rj Seech Recognition Lecture 6: Pronunciation Modeling and Decision Trees 78
79 Diagonal Gaussian Slits, cont d Thus, the best question Q minimizes: D Q = n l j= log ˆ σ + lj nr log ˆ σ rj j= Where ˆ σ lj = y j ( y j ) j =,,.. n n l y Y l l y Y D Q involves little more than summing vector elements and their squares. l Seech Recognition Lecture 6: Pronunciation Modeling and Decision Trees 79
80 How Big a Tree? CART suggests cross-validation Measure erformance on a held-out data set Choose the tree size that maximizes the likelihood of the held-out data In ractice, simle heuristics seem to work well A decision tree is fully grown when no terminal node can be slit Reasons for not slitting a node include:. Insufficient data for accurate question evaluation. Best question was not very helful / did not imrove the likelihood significantly 3. Cannot coe with any more nodes due to CPU/memory limitations Seech Recognition Lecture 6: Pronunciation Modeling and Decision Trees 80
81 Instability Decision trees have the undesirable roerty that a small change in the data can result in a large difference in the final tree Consider a -class roblem, w and w Observations for each class are -dimensional w x w o x x x x * (.3^) Seech Recognition Lecture 6: Pronunciation Modeling and Decision Trees 8
82 x < x x x o x x <0.3 x <0.6 x <0.35 x o x < x x o x o* o o o o x o o x o x Seech Recognition Lecture 6: Pronunciation Modeling and Decision Trees 8
83 x < x x x o x x <0.09 x <0.6 x o x x < x x o x o o^ o o o x o o x Seech Recognition Lecture 6: Pronunciation Modeling and Decision Trees 83
84 Bagging EECS E6870 Create multile models by training the same learner on different samles of the training data Given a training set of size n, create m different training sets of size n by samling with relacement Combine the m models using a simle majority vote. Can be alied to any learning method, including decision trees. Decreases the generalization error by reducing variance in the results for unstable learners (i.e. when the hyothesis can change dramatically if training data is slightly altered.) Seech Recognition Lecture 6: Pronunciation Modeling and Decision Trees 84
85 Seech Recognition Lecture 6: Pronunciation Modeling and Decision Trees 85
86 Boosting EECS E6870 Another method for roducing multile models by reeatedly altering the data given to a learner. Examles are given weights, and at each iteration a new hyothesis is learned and the examles are reweighted to focus on those that the latest hyothesis got wrong During testing, each of the hyotheses gets a weighted voted roortional to its training set accuracy. Seech Recognition Lecture 6: Pronunciation Modeling and Decision Trees 86
87 Strengths & Weaknesses of Decision Trees Strengths Easy to generate; simle algorithm Relatively fast to construct Classification is very fast Can achieve good erformance on many tasks Weaknesses Not always sufficient to learn comlex concets Can be hard to interret. Real roblems can roduce large trees Some roblems with continuously valued attributes may not be easily discretized Data fragmentation Seech Recognition Lecture 6: Pronunciation Modeling and Decision Trees 87
88 Putting it all together Given a word sequence, we can construct the corresonding Markov model by:. Re-writing word string as a sequence of honemes. Concatenating honetic models 3. Using the aroriate tree for each arc to determine which alloarc (leaf) is to be used in that context Seech Recognition Lecture 6: Pronunciation Modeling and Decision Trees 88
89 Examle The rain in Sain falls. Look these words u in the dictionary to get: DH AX R EY N IX N S P EY N F AA L Z Rewrite hones as states according to honetic model DH DH DH 3 AX AX AX 3 R R R 3 EY EY EY 3 Using honetic context, descend decision tree to find leaf sequences DH _5 DH _7 DH 3_4 AX _53 AX _37 AX 3_ R _4 R _46. Use the Gaussian mixture model for the aroriate leaf as the observation robabilities for each state in the Hidden Markov Model. Seech Recognition Lecture 6: Pronunciation Modeling and Decision Trees 89
90 Course Feedback Was this lecture mostly clear or unclear? What was the muddiest toic? Other feedback (ace, content, atmoshere, etc.) Seech Recognition Lecture 6: Pronunciation Modeling and Decision Trees 90
Lecture 6. Michael Picheny, Bhuvana Ramabhadran, Stanley F. Chen, Markus Nussbaum-Thom
Lecture 6 pr@ n2nsi"eis@n "mad@lin Michael Picheny, Bhuvana Ramabhadran, Stanley F. Chen, Markus Nussbaum-Thom Watson Group IBM T.J. Watson Research Center Yorktown Heights, New York, USA {picheny,bhuvana,stanchen,nussbaum}@us.ibm.com
More informationFeedback-error control
Chater 4 Feedback-error control 4.1 Introduction This chater exlains the feedback-error (FBE) control scheme originally described by Kawato [, 87, 8]. FBE is a widely used neural network based controller
More informationPronunciation Modeling for Large Vocabulary Continuous Speech Recognition
From the IBM Seech Recogitio Course Prouciatio Modelig for Large Vocabulary Cotiuous Seech Recogitio Michael Pichey IBM TJ Watso Research Ceter Yorktow Heights, NY 0598 4 Aril 0 0 IBM Cororatio The statistical
More information4. Score normalization technical details We now discuss the technical details of the score normalization method.
SMT SCORING SYSTEM This document describes the scoring system for the Stanford Math Tournament We begin by giving an overview of the changes to scoring and a non-technical descrition of the scoring rules
More informationSolved Problems. (a) (b) (c) Figure P4.1 Simple Classification Problems First we draw a line between each set of dark and light data points.
Solved Problems Solved Problems P Solve the three simle classification roblems shown in Figure P by drawing a decision boundary Find weight and bias values that result in single-neuron ercetrons with the
More informationTopic: Lower Bounds on Randomized Algorithms Date: September 22, 2004 Scribe: Srinath Sridhar
15-859(M): Randomized Algorithms Lecturer: Anuam Guta Toic: Lower Bounds on Randomized Algorithms Date: Setember 22, 2004 Scribe: Srinath Sridhar 4.1 Introduction In this lecture, we will first consider
More informationApproximating min-max k-clustering
Aroximating min-max k-clustering Asaf Levin July 24, 2007 Abstract We consider the roblems of set artitioning into k clusters with minimum total cost and minimum of the maximum cost of a cluster. The cost
More informationStatistics II Logistic Regression. So far... Two-way repeated measures ANOVA: an example. RM-ANOVA example: the data after log transform
Statistics II Logistic Regression Çağrı Çöltekin Exam date & time: June 21, 10:00 13:00 (The same day/time lanned at the beginning of the semester) University of Groningen, Det of Information Science May
More informationStatistical NLP Spring Digitizing Speech
Statistical NLP Spring 2008 Lecture 10: Acoustic Models Dan Klein UC Berkeley Digitizing Speech 1 Frame Extraction A frame (25 ms wide) extracted every 10 ms 25 ms 10ms... a 1 a 2 a 3 Figure from Simon
More informationDigitizing Speech. Statistical NLP Spring Frame Extraction. Gaussian Emissions. Vector Quantization. HMMs for Continuous Observations? ...
Statistical NLP Spring 2008 Digitizing Speech Lecture 10: Acoustic Models Dan Klein UC Berkeley Frame Extraction A frame (25 ms wide extracted every 10 ms 25 ms 10ms... a 1 a 2 a 3 Figure from Simon Arnfield
More informationRound-off Errors and Computer Arithmetic - (1.2)
Round-off Errors and Comuter Arithmetic - (.). Round-off Errors: Round-off errors is roduced when a calculator or comuter is used to erform real number calculations. That is because the arithmetic erformed
More informationRadial Basis Function Networks: Algorithms
Radial Basis Function Networks: Algorithms Introduction to Neural Networks : Lecture 13 John A. Bullinaria, 2004 1. The RBF Maing 2. The RBF Network Architecture 3. Comutational Power of RBF Networks 4.
More informationNamed Entity Recognition using Maximum Entropy Model SEEM5680
Named Entity Recognition using Maximum Entroy Model SEEM5680 Named Entity Recognition System Named Entity Recognition (NER): Identifying certain hrases/word sequences in a free text. Generally it involves
More informationAI*IA 2003 Fusion of Multiple Pattern Classifiers PART III
AI*IA 23 Fusion of Multile Pattern Classifiers PART III AI*IA 23 Tutorial on Fusion of Multile Pattern Classifiers by F. Roli 49 Methods for fusing multile classifiers Methods for fusing multile classifiers
More informationGeneral Linear Model Introduction, Classes of Linear models and Estimation
Stat 740 General Linear Model Introduction, Classes of Linear models and Estimation An aim of scientific enquiry: To describe or to discover relationshis among events (variables) in the controlled (laboratory)
More informationwhere x i is the ith coordinate of x R N. 1. Show that the following upper bound holds for the growth function of H:
Mehryar Mohri Foundations of Machine Learning Courant Institute of Mathematical Sciences Homework assignment 2 October 25, 2017 Due: November 08, 2017 A. Growth function Growth function of stum functions.
More informationCombining Logistic Regression with Kriging for Mapping the Risk of Occurrence of Unexploded Ordnance (UXO)
Combining Logistic Regression with Kriging for Maing the Risk of Occurrence of Unexloded Ordnance (UXO) H. Saito (), P. Goovaerts (), S. A. McKenna (2) Environmental and Water Resources Engineering, Deartment
More informationThe Noisy Channel Model. Statistical NLP Spring Mel Freq. Cepstral Coefficients. Frame Extraction ... Lecture 10: Acoustic Models
Statistical NLP Spring 2009 The Noisy Channel Model Lecture 10: Acoustic Models Dan Klein UC Berkeley Search through space of all possible sentences. Pick the one that is most probable given the waveform.
More informationStatistical NLP Spring The Noisy Channel Model
Statistical NLP Spring 2009 Lecture 10: Acoustic Models Dan Klein UC Berkeley The Noisy Channel Model Search through space of all possible sentences. Pick the one that is most probable given the waveform.
More informationTests for Two Proportions in a Stratified Design (Cochran/Mantel-Haenszel Test)
Chater 225 Tests for Two Proortions in a Stratified Design (Cochran/Mantel-Haenszel Test) Introduction In a stratified design, the subects are selected from two or more strata which are formed from imortant
More informationFinding Shortest Hamiltonian Path is in P. Abstract
Finding Shortest Hamiltonian Path is in P Dhananay P. Mehendale Sir Parashurambhau College, Tilak Road, Pune, India bstract The roblem of finding shortest Hamiltonian ath in a eighted comlete grah belongs
More informationEECS E6870: Lecture 4: Hidden Markov Models
EECS E6870: Lecture 4: Hidden Markov Models Stanley F. Chen, Michael A. Picheny and Bhuvana Ramabhadran IBM T. J. Watson Research Center Yorktown Heights, NY 10549 stanchen@us.ibm.com, picheny@us.ibm.com,
More informationHotelling s Two- Sample T 2
Chater 600 Hotelling s Two- Samle T Introduction This module calculates ower for the Hotelling s two-grou, T-squared (T) test statistic. Hotelling s T is an extension of the univariate two-samle t-test
More informationECE 534 Information Theory - Midterm 2
ECE 534 Information Theory - Midterm Nov.4, 009. 3:30-4:45 in LH03. You will be given the full class time: 75 minutes. Use it wisely! Many of the roblems have short answers; try to find shortcuts. You
More informationEE/Stats 376A: Information theory Winter Lecture 5 Jan 24. Lecturer: David Tse Scribe: Michael X, Nima H, Geng Z, Anton J, Vivek B.
EE/Stats 376A: Information theory Winter 207 Lecture 5 Jan 24 Lecturer: David Tse Scribe: Michael X, Nima H, Geng Z, Anton J, Vivek B. 5. Outline Markov chains and stationary distributions Prefix codes
More informationStatics and dynamics: some elementary concepts
1 Statics and dynamics: some elementary concets Dynamics is the study of the movement through time of variables such as heartbeat, temerature, secies oulation, voltage, roduction, emloyment, rices and
More informationOutline. Markov Chains and Markov Models. Outline. Markov Chains. Markov Chains Definitions Huizhen Yu
and Markov Models Huizhen Yu janey.yu@cs.helsinki.fi Det. Comuter Science, Univ. of Helsinki Some Proerties of Probabilistic Models, Sring, 200 Huizhen Yu (U.H.) and Markov Models Jan. 2 / 32 Huizhen Yu
More informationCryptanalysis of Pseudorandom Generators
CSE 206A: Lattice Algorithms and Alications Fall 2017 Crytanalysis of Pseudorandom Generators Instructor: Daniele Micciancio UCSD CSE As a motivating alication for the study of lattice in crytograhy we
More informationFinite Mixture EFA in Mplus
Finite Mixture EFA in Mlus November 16, 2007 In this document we describe the Mixture EFA model estimated in Mlus. Four tyes of deendent variables are ossible in this model: normally distributed, ordered
More informationThe Noisy Channel Model. Statistical NLP Spring Mel Freq. Cepstral Coefficients. Frame Extraction ... Lecture 9: Acoustic Models
Statistical NLP Spring 2010 The Noisy Channel Model Lecture 9: Acoustic Models Dan Klein UC Berkeley Acoustic model: HMMs over word positions with mixtures of Gaussians as emissions Language model: Distributions
More informationMODELING THE RELIABILITY OF C4ISR SYSTEMS HARDWARE/SOFTWARE COMPONENTS USING AN IMPROVED MARKOV MODEL
Technical Sciences and Alied Mathematics MODELING THE RELIABILITY OF CISR SYSTEMS HARDWARE/SOFTWARE COMPONENTS USING AN IMPROVED MARKOV MODEL Cezar VASILESCU Regional Deartment of Defense Resources Management
More informationMATH 2710: NOTES FOR ANALYSIS
MATH 270: NOTES FOR ANALYSIS The main ideas we will learn from analysis center around the idea of a limit. Limits occurs in several settings. We will start with finite limits of sequences, then cover infinite
More informationNotes on Instrumental Variables Methods
Notes on Instrumental Variables Methods Michele Pellizzari IGIER-Bocconi, IZA and frdb 1 The Instrumental Variable Estimator Instrumental variable estimation is the classical solution to the roblem of
More informationMachine Learning: Homework 4
10-601 Machine Learning: Homework 4 Due 5.m. Monday, February 16, 2015 Instructions Late homework olicy: Homework is worth full credit if submitted before the due date, half credit during the next 48 hours,
More informationdn i where we have used the Gibbs equation for the Gibbs energy and the definition of chemical potential
Chem 467 Sulement to Lectures 33 Phase Equilibrium Chemical Potential Revisited We introduced the chemical otential as the conjugate variable to amount. Briefly reviewing, the total Gibbs energy of a system
More informationUniversal Finite Memory Coding of Binary Sequences
Deartment of Electrical Engineering Systems Universal Finite Memory Coding of Binary Sequences Thesis submitted towards the degree of Master of Science in Electrical and Electronic Engineering in Tel-Aviv
More informationAn Introduction To Range Searching
An Introduction To Range Searching Jan Vahrenhold eartment of Comuter Science Westfälische Wilhelms-Universität Münster, Germany. Overview 1. Introduction: Problem Statement, Lower Bounds 2. Range Searching
More informationSTK4900/ Lecture 7. Program
STK4900/9900 - Lecture 7 Program 1. Logistic regression with one redictor 2. Maximum likelihood estimation 3. Logistic regression with several redictors 4. Deviance and likelihood ratio tests 5. A comment
More informationarxiv: v1 [physics.data-an] 26 Oct 2012
Constraints on Yield Parameters in Extended Maximum Likelihood Fits Till Moritz Karbach a, Maximilian Schlu b a TU Dortmund, Germany, moritz.karbach@cern.ch b TU Dortmund, Germany, maximilian.schlu@cern.ch
More informationDETC2003/DAC AN EFFICIENT ALGORITHM FOR CONSTRUCTING OPTIMAL DESIGN OF COMPUTER EXPERIMENTS
Proceedings of DETC 03 ASME 003 Design Engineering Technical Conferences and Comuters and Information in Engineering Conference Chicago, Illinois USA, Setember -6, 003 DETC003/DAC-48760 AN EFFICIENT ALGORITHM
More informationUse of Transformations and the Repeated Statement in PROC GLM in SAS Ed Stanek
Use of Transformations and the Reeated Statement in PROC GLM in SAS Ed Stanek Introduction We describe how the Reeated Statement in PROC GLM in SAS transforms the data to rovide tests of hyotheses of interest.
More informationModel checking, verification of CTL. One must verify or expel... doubts, and convert them into the certainty of YES [Thomas Carlyle]
Chater 5 Model checking, verification of CTL One must verify or exel... doubts, and convert them into the certainty of YES or NO. [Thomas Carlyle] 5. The verification setting Page 66 We introduce linear
More informationFor q 0; 1; : : : ; `? 1, we have m 0; 1; : : : ; q? 1. The set fh j(x) : j 0; 1; ; : : : ; `? 1g forms a basis for the tness functions dened on the i
Comuting with Haar Functions Sami Khuri Deartment of Mathematics and Comuter Science San Jose State University One Washington Square San Jose, CA 9519-0103, USA khuri@juiter.sjsu.edu Fax: (40)94-500 Keywords:
More informationA Qualitative Event-based Approach to Multiple Fault Diagnosis in Continuous Systems using Structural Model Decomposition
A Qualitative Event-based Aroach to Multile Fault Diagnosis in Continuous Systems using Structural Model Decomosition Matthew J. Daigle a,,, Anibal Bregon b,, Xenofon Koutsoukos c, Gautam Biswas c, Belarmino
More informationThe Poisson Regression Model
The Poisson Regression Model The Poisson regression model aims at modeling a counting variable Y, counting the number of times that a certain event occurs during a given time eriod. We observe a samle
More informationTopic 7: Using identity types
Toic 7: Using identity tyes June 10, 2014 Now we would like to learn how to use identity tyes and how to do some actual mathematics with them. By now we have essentially introduced all inference rules
More informationLearning Sequence Motif Models Using Gibbs Sampling
Learning Sequence Motif Models Using Gibbs Samling BMI/CS 776 www.biostat.wisc.edu/bmi776/ Sring 2018 Anthony Gitter gitter@biostat.wisc.edu These slides excluding third-arty material are licensed under
More information8 STOCHASTIC PROCESSES
8 STOCHASTIC PROCESSES The word stochastic is derived from the Greek στoχαστικoς, meaning to aim at a target. Stochastic rocesses involve state which changes in a random way. A Markov rocess is a articular
More informationJohn Weatherwax. Analysis of Parallel Depth First Search Algorithms
Sulementary Discussions and Solutions to Selected Problems in: Introduction to Parallel Comuting by Viin Kumar, Ananth Grama, Anshul Guta, & George Karyis John Weatherwax Chater 8 Analysis of Parallel
More informationCHAPTER 5 STATISTICAL INFERENCE. 1.0 Hypothesis Testing. 2.0 Decision Errors. 3.0 How a Hypothesis is Tested. 4.0 Test for Goodness of Fit
Chater 5 Statistical Inference 69 CHAPTER 5 STATISTICAL INFERENCE.0 Hyothesis Testing.0 Decision Errors 3.0 How a Hyothesis is Tested 4.0 Test for Goodness of Fit 5.0 Inferences about Two Means It ain't
More information2. Sample representativeness. That means some type of probability/random sampling.
1 Neuendorf Cluster Analysis Assumes: 1. Actually, any level of measurement (nominal, ordinal, interval/ratio) is accetable for certain tyes of clustering. The tyical methods, though, require metric (I/R)
More informationPretest (Optional) Use as an additional pacing tool to guide instruction. August 21
Trimester 1 Pretest (Otional) Use as an additional acing tool to guide instruction. August 21 Beyond the Basic Facts In Trimester 1, Grade 8 focus on multilication. Daily Unit 1: Rational vs. Irrational
More informationOn split sample and randomized confidence intervals for binomial proportions
On slit samle and randomized confidence intervals for binomial roortions Måns Thulin Deartment of Mathematics, Usala University arxiv:1402.6536v1 [stat.me] 26 Feb 2014 Abstract Slit samle methods have
More informationLinear diophantine equations for discrete tomography
Journal of X-Ray Science and Technology 10 001 59 66 59 IOS Press Linear diohantine euations for discrete tomograhy Yangbo Ye a,gewang b and Jiehua Zhu a a Deartment of Mathematics, The University of Iowa,
More informationIntroduction to Probability and Statistics
Introduction to Probability and Statistics Chater 8 Ammar M. Sarhan, asarhan@mathstat.dal.ca Deartment of Mathematics and Statistics, Dalhousie University Fall Semester 28 Chater 8 Tests of Hyotheses Based
More informationOutline for today. Maximum likelihood estimation. Computation with multivariate normal distributions. Multivariate normal distribution
Outline for today Maximum likelihood estimation Rasmus Waageetersen Deartment of Mathematics Aalborg University Denmark October 30, 2007 the multivariate normal distribution linear and linear mixed models
More informationPretest (Optional) Use as an additional pacing tool to guide instruction. August 21
Trimester 1 Pretest (Otional) Use as an additional acing tool to guide instruction. August 21 Beyond the Basic Facts In Trimester 1, Grade 7 focus on multilication. Daily Unit 1: The Number System Part
More informationConvolutional Codes. Lecture 13. Figure 93: Encoder for rate 1/2 constraint length 3 convolutional code.
Convolutional Codes Goals Lecture Be able to encode using a convolutional code Be able to decode a convolutional code received over a binary symmetric channel or an additive white Gaussian channel Convolutional
More information7.2 Inference for comparing means of two populations where the samples are independent
Objectives 7.2 Inference for comaring means of two oulations where the samles are indeendent Two-samle t significance test (we give three examles) Two-samle t confidence interval htt://onlinestatbook.com/2/tests_of_means/difference_means.ht
More informationConvex Optimization methods for Computing Channel Capacity
Convex Otimization methods for Comuting Channel Caacity Abhishek Sinha Laboratory for Information and Decision Systems (LIDS), MIT sinhaa@mit.edu May 15, 2014 We consider a classical comutational roblem
More informationarxiv:cond-mat/ v2 25 Sep 2002
Energy fluctuations at the multicritical oint in two-dimensional sin glasses arxiv:cond-mat/0207694 v2 25 Se 2002 1. Introduction Hidetoshi Nishimori, Cyril Falvo and Yukiyasu Ozeki Deartment of Physics,
More informationResearch of PMU Optimal Placement in Power Systems
Proceedings of the 5th WSEAS/IASME Int. Conf. on SYSTEMS THEORY and SCIENTIFIC COMPUTATION, Malta, Setember 15-17, 2005 (38-43) Research of PMU Otimal Placement in Power Systems TIAN-TIAN CAI, QIAN AI
More informationVariable Selection and Model Building
LINEAR REGRESSION ANALYSIS MODULE XIII Lecture - 38 Variable Selection and Model Building Dr. Shalabh Deartment of Mathematics and Statistics Indian Institute of Technology Kanur Evaluation of subset regression
More informationCharacterizing the Behavior of a Probabilistic CMOS Switch Through Analytical Models and Its Verification Through Simulations
Characterizing the Behavior of a Probabilistic CMOS Switch Through Analytical Models and Its Verification Through Simulations PINAR KORKMAZ, BILGE E. S. AKGUL and KRISHNA V. PALEM Georgia Institute of
More informationAsymptotically Optimal Simulation Allocation under Dependent Sampling
Asymtotically Otimal Simulation Allocation under Deendent Samling Xiaoing Xiong The Robert H. Smith School of Business, University of Maryland, College Park, MD 20742-1815, USA, xiaoingx@yahoo.com Sandee
More informationState Estimation with ARMarkov Models
Deartment of Mechanical and Aerosace Engineering Technical Reort No. 3046, October 1998. Princeton University, Princeton, NJ. State Estimation with ARMarkov Models Ryoung K. Lim 1 Columbia University,
More informationFE FORMULATIONS FOR PLASTICITY
G These slides are designed based on the book: Finite Elements in Plasticity Theory and Practice, D.R.J. Owen and E. Hinton, 1970, Pineridge Press Ltd., Swansea, UK. 1 Course Content: A INTRODUCTION AND
More informationInformation collection on a graph
Information collection on a grah Ilya O. Ryzhov Warren Powell February 10, 2010 Abstract We derive a knowledge gradient olicy for an otimal learning roblem on a grah, in which we use sequential measurements
More informationSystem Reliability Estimation and Confidence Regions from Subsystem and Full System Tests
009 American Control Conference Hyatt Regency Riverfront, St. Louis, MO, USA June 0-, 009 FrB4. System Reliability Estimation and Confidence Regions from Subsystem and Full System Tests James C. Sall Abstract
More informationLecture 17: Trees and Merge Sort 10:00 AM, Oct 15, 2018
CS17 Integrated Introduction to Computer Science Klein Contents Lecture 17: Trees and Merge Sort 10:00 AM, Oct 15, 2018 1 Tree definitions 1 2 Analysis of mergesort using a binary tree 1 3 Analysis of
More informationCSE 599d - Quantum Computing When Quantum Computers Fall Apart
CSE 599d - Quantum Comuting When Quantum Comuters Fall Aart Dave Bacon Deartment of Comuter Science & Engineering, University of Washington In this lecture we are going to begin discussing what haens to
More informationSupplementary Materials for Robust Estimation of the False Discovery Rate
Sulementary Materials for Robust Estimation of the False Discovery Rate Stan Pounds and Cheng Cheng This sulemental contains roofs regarding theoretical roerties of the roosed method (Section S1), rovides
More informationAn Analysis of Reliable Classifiers through ROC Isometrics
An Analysis of Reliable Classifiers through ROC Isometrics Stijn Vanderlooy s.vanderlooy@cs.unimaas.nl Ida G. Srinkhuizen-Kuyer kuyer@cs.unimaas.nl Evgueni N. Smirnov smirnov@cs.unimaas.nl MICC-IKAT, Universiteit
More informationMulti-Operation Multi-Machine Scheduling
Multi-Oeration Multi-Machine Scheduling Weizhen Mao he College of William and Mary, Williamsburg VA 3185, USA Abstract. In the multi-oeration scheduling that arises in industrial engineering, each job
More informationOutline. EECS150 - Digital Design Lecture 26 Error Correction Codes, Linear Feedback Shift Registers (LFSRs) Simple Error Detection Coding
Outline EECS150 - Digital Design Lecture 26 Error Correction Codes, Linear Feedback Shift Registers (LFSRs) Error detection using arity Hamming code for error detection/correction Linear Feedback Shift
More information16.2. Infinite Series. Introduction. Prerequisites. Learning Outcomes
Infinite Series 6.2 Introduction We extend the concet of a finite series, met in Section 6., to the situation in which the number of terms increase without bound. We define what is meant by an infinite
More informationA Bound on the Error of Cross Validation Using the Approximation and Estimation Rates, with Consequences for the Training-Test Split
A Bound on the Error of Cross Validation Using the Aroximation and Estimation Rates, with Consequences for the Training-Test Slit Michael Kearns AT&T Bell Laboratories Murray Hill, NJ 7974 mkearns@research.att.com
More informationYixi Shi. Jose Blanchet. IEOR Department Columbia University New York, NY 10027, USA. IEOR Department Columbia University New York, NY 10027, USA
Proceedings of the 2011 Winter Simulation Conference S. Jain, R. R. Creasey, J. Himmelsach, K. P. White, and M. Fu, eds. EFFICIENT RARE EVENT SIMULATION FOR HEAVY-TAILED SYSTEMS VIA CROSS ENTROPY Jose
More informationDistributed Rule-Based Inference in the Presence of Redundant Information
istribution Statement : roved for ublic release; distribution is unlimited. istributed Rule-ased Inference in the Presence of Redundant Information June 8, 004 William J. Farrell III Lockheed Martin dvanced
More informationBayesian inference & Markov chain Monte Carlo. Note 1: Many slides for this lecture were kindly provided by Paul Lewis and Mark Holder
Bayesian inference & Markov chain Monte Carlo Note 1: Many slides for this lecture were kindly rovided by Paul Lewis and Mark Holder Note 2: Paul Lewis has written nice software for demonstrating Markov
More informationUnit 1 - Computer Arithmetic
FIXD-POINT (FX) ARITHMTIC Unit 1 - Comuter Arithmetic INTGR NUMBRS n bit number: b n 1 b n 2 b 0 Decimal Value Range of values UNSIGND n 1 SIGND D = b i 2 i D = 2 n 1 b n 1 + b i 2 i n 2 i=0 i=0 [0, 2
More informationHypothesis Test-Confidence Interval connection
Hyothesis Test-Confidence Interval connection Hyothesis tests for mean Tell whether observed data are consistent with μ = μ. More secifically An hyothesis test with significance level α will reject the
More informationp(-,i)+p(,i)+p(-,v)+p(i,v),v)+p(i,v)
Multile Sequence Alignment Given: Set of sequences Score matrix Ga enalties Find: Alignment of sequences such that otimal score is achieved. Motivation Aligning rotein families Establish evolutionary relationshis
More informationRecent Developments in Multilayer Perceptron Neural Networks
Recent Develoments in Multilayer Percetron eural etworks Walter H. Delashmit Lockheed Martin Missiles and Fire Control Dallas, Texas 75265 walter.delashmit@lmco.com walter.delashmit@verizon.net Michael
More informationProbability Estimates for Multi-class Classification by Pairwise Coupling
Probability Estimates for Multi-class Classification by Pairwise Couling Ting-Fan Wu Chih-Jen Lin Deartment of Comuter Science National Taiwan University Taiei 06, Taiwan Ruby C. Weng Deartment of Statistics
More informationCSC165H, Mathematical expression and reasoning for computer science week 12
CSC165H, Mathematical exression and reasoning for comuter science week 1 nd December 005 Gary Baumgartner and Danny Hea hea@cs.toronto.edu SF4306A 416-978-5899 htt//www.cs.toronto.edu/~hea/165/s005/index.shtml
More informationChapter 7 Rational and Irrational Numbers
Chater 7 Rational and Irrational Numbers In this chater we first review the real line model for numbers, as discussed in Chater 2 of seventh grade, by recalling how the integers and then the rational numbers
More informationMATHEMATICAL MODELLING OF THE WIRELESS COMMUNICATION NETWORK
Comuter Modelling and ew Technologies, 5, Vol.9, o., 3-39 Transort and Telecommunication Institute, Lomonosov, LV-9, Riga, Latvia MATHEMATICAL MODELLIG OF THE WIRELESS COMMUICATIO ETWORK M. KOPEETSK Deartment
More informationPHYS 301 HOMEWORK #9-- SOLUTIONS
PHYS 0 HOMEWORK #9-- SOLUTIONS. We are asked to use Dirichlet' s theorem to determine the value of f (x) as defined below at x = 0, ± /, ± f(x) = 0, - < x
More informationRANDOM WALKS AND PERCOLATION: AN ANALYSIS OF CURRENT RESEARCH ON MODELING NATURAL PROCESSES
RANDOM WALKS AND PERCOLATION: AN ANALYSIS OF CURRENT RESEARCH ON MODELING NATURAL PROCESSES AARON ZWIEBACH Abstract. In this aer we will analyze research that has been recently done in the field of discrete
More informationOne-way ANOVA Inference for one-way ANOVA
One-way ANOVA Inference for one-way ANOVA IPS Chater 12.1 2009 W.H. Freeman and Comany Objectives (IPS Chater 12.1) Inference for one-way ANOVA Comaring means The two-samle t statistic An overview of ANOVA
More informationBrownian Motion and Random Prime Factorization
Brownian Motion and Random Prime Factorization Kendrick Tang June 4, 202 Contents Introduction 2 2 Brownian Motion 2 2. Develoing Brownian Motion.................... 2 2.. Measure Saces and Borel Sigma-Algebras.........
More informationAn Improved Calibration Method for a Chopped Pyrgeometer
96 JOURNAL OF ATMOSPHERIC AND OCEANIC TECHNOLOGY VOLUME 17 An Imroved Calibration Method for a Choed Pyrgeometer FRIEDRICH FERGG OtoLab, Ingenieurbüro, Munich, Germany PETER WENDLING Deutsches Forschungszentrum
More informationA New Perspective on Learning Linear Separators with Large L q L p Margins
A New Persective on Learning Linear Searators with Large L q L Margins Maria-Florina Balcan Georgia Institute of Technology Christoher Berlind Georgia Institute of Technology Abstract We give theoretical
More informationNumerical Linear Algebra
Numerical Linear Algebra Numerous alications in statistics, articularly in the fitting of linear models. Notation and conventions: Elements of a matrix A are denoted by a ij, where i indexes the rows and
More informationMorten Frydenberg Section for Biostatistics Version :Friday, 05 September 2014
Morten Frydenberg Section for Biostatistics Version :Friday, 05 Setember 204 All models are aroximations! The best model does not exist! Comlicated models needs a lot of data. lower your ambitions or get
More informationFinite-State Verification or Model Checking. Finite State Verification (FSV) or Model Checking
Finite-State Verification or Model Checking Finite State Verification (FSV) or Model Checking Holds the romise of roviding a cost effective way of verifying imortant roerties about a system Not all faults
More informationHidden Predictors: A Factor Analysis Primer
Hidden Predictors: A Factor Analysis Primer Ryan C Sanchez Western Washington University Factor Analysis is a owerful statistical method in the modern research sychologist s toolbag When used roerly, factor
More informationMaximum Entropy and the Stress Distribution in Soft Disk Packings Above Jamming
Maximum Entroy and the Stress Distribution in Soft Disk Packings Above Jamming Yegang Wu and S. Teitel Deartment of Physics and Astronomy, University of ochester, ochester, New York 467, USA (Dated: August
More informationSection 0.10: Complex Numbers from Precalculus Prerequisites a.k.a. Chapter 0 by Carl Stitz, PhD, and Jeff Zeager, PhD, is available under a Creative
Section 0.0: Comlex Numbers from Precalculus Prerequisites a.k.a. Chater 0 by Carl Stitz, PhD, and Jeff Zeager, PhD, is available under a Creative Commons Attribution-NonCommercial-ShareAlike.0 license.
More information