THE MARKOV SELECTION MODEL FOR CONCURRENT SPEECH RECOGNITION

Size: px

Start display at page:

Download "THE MARKOV SELECTION MODEL FOR CONCURRENT SPEECH RECOGNITION"

Rosamond Hawkins
5 years ago
Views:

1 THE MARKOV SELECTION MODEL FOR CONCURRENT SEECH RECOGNITION aris Smaragdis Adobe Systems Inc. Cambridge, MA, USA Bhiksha Raj Walt Disney Research ittsburgh, A, USA ABSTRACT In this paper we introduce a new Markov model that is capable o recogniing speech rom recordings o simultaneously speaking a priori known speakers. This work is based on recent work on nonnegative representations o spectrograms, which has been shown to be very eective in source separation problems. In this paper we extend these approaches to design a Markov selection model that is able to recognie sequences even when they are presented mixed together. We do so without the need to perorm separation on the signals. Unlike actorial Markov models which have been used similarly in the past, this approach eatures a low computational complexity in the number o sources and Markov states, which makes it a highly eicient alternative. We demonstrate the use o this ramework in recogniing speech rom mixtures o known speakers.. INTRODUCTION Speech recognition o concurrent speakers is a signiicantly hard task. Although contrived as a scenario, its solution can be o great use to noise-robust speech recognition and can provide insight on how to deal with some o the hardest problems in acoustic sensing. Current models or speech recognition cannot be easily extended to deal with additive intererence, and oten need to be complemented with a separation algorithm that preprocesses the data beore recognition takes place. This is oten a risky combination since the output o a separation algorithm is not guaranteed to be recogniable speech, at least not by a speech recognition system. A dierent temporally-sensitive approach characteries the speech rom all concurrent sources (speakers) by HMMs. The sum o the speech is then characteried by a actorial HMM, which is essentially a product o the HMMs representing the individual sources. Inerence can be run on this actorial HMM to determine what was spoken by individual speakers. The problem o course is computational complexity: the number o states in the actorial HMM is the product o the states in the HMMs or individual sources, i.e. is polynomial in the number o concurrent sources. The time taken or inerence is polynomial in twice the number o sources, requiring complicated variational methods to make it tractable. In this paper we introduce a Markov selection model coupled with a probabilistic decomposition that allows us to recognie additive speech mixtures in time that is linear in the number o concurrent sources. We make use o speaker-dependent models, which when used on speech mixtures can obtain reliable estimates o all the spoken utterances. In the ollowing sections we describe an Author also with Language Technologies Institute, School o Computer Science, Carnegie Mellon University. additive model which has been used in the past or source separation, then describe how it can be incorporated into an HMM-based speech recognition ramework to orm the proposed Markov selection model. Finally we present results rom experiments based on the ICSL 006 speech separation challenge data [].. ADDITIVE MODELS OF SOUNDS Recently we have seen wide use o non-negative actoriation methods with applications to source separation rom single channel recordings ([,, ]). In this paper we will make use o such a model and adapt it or use in a speech recognition system. Let us begin by reviewing how these models work and also setup the notation used henceorth. Many modern source separation methods use prior knowledge o the sources in a mixture. A common scenario is one where or two speakers, speaker a and speaker b,wehave training recordings x a (t) and x b (t), and a mixture m(t) =y a (t) +y b (t). Our goal is then to use the inormation extracted rom x a (t) and x b (t) to estimate y a (t) and y b (t) by observing only m(t). A very eicient and capable approach to perorm this task is by using non-negative spectrum actoriation methods. Here we will use a probabilistic version o these techniques which allows us to later incorporate it in a Markov model. In this setting, we extract the spectral magnitude o the observed signals at regularly sampled analysis rames: X () DFT(x(T ( ) +,..., T )), () where T is the sie o the analysis rame we choose to use. Doing so we can obtain X a and X b, the magnitude spectra or signals rom speakers a and b. We model magnitude spectra as histograms drawn rom a mixture o multinomial distributions. This leads to the ollowing latent variable model: X () ( ) (), () where the symbol represents drawing rom a distribution, ( ) represents the th component multinomial and () is the probability with which it is mixed to produce X, the magnitude spectrum vector or the th analysis rame. M is the total number o component multinomials. The component multinomials (which we will reer to as multinomial bases ) ( ) or any speaker and the corresponding mixture weights () or each spectral vector can now be estimated using an Expectation-Maximiation algorithm. This is essentially a simpliied plsi model [5], but looking past its probabilistic ormulation we note that ( ) is in act

2 a normalied spectrum. The set o all multinomials can thus be viewed as a dictionary o spectral bases, Equation can be viewed as an algebraic decomposition and M as the rank o this decomposition. () can be seen as weights that tell us how to put the dictionary elements together to approximate the input at hand. Thus Equation can be written as: X () ˆX () =g M X ( ) (), () where g = X (). The scalar g ensures that the eventual approximation is scaled appropriately to match the input. This can also be thought o as a non-negative matrix actoriation [6] in which ( ) and () correspond to the two non-negative actors. There are two key observations we need to make in order to be able to extract y a (t) and y b (t) rom m(t). Theirst one is that in general it will hold that: M () Y a ()+Y b (). () This means that the magnitude spectrogram o the mixture o the two sources will be approximately equal to the sum o the magnitude spectrograms o the two sources. Although due to phase cancellations we cannot achieve exact equality, this assumption has been used with great success so ar and it is largely correct or most practical purposes. The second observation is that the multinomial bases a ( ), that we can estimate rom X a, can describe Y a better than the bases b ( ) estimated rom X b, and vice versa, i.e.: D KL D KL Y a g! a ( ) () < Y a g! (5) b ( ) () and vice-versa. Where D KL( ) denotes the Kullback-Leibler divergence, a ( ) and b ( ) are the dictionaries learned rom x a and x b, and each () is the optimal weight distribution or given each o the two dictionaries. These two observations then allow us to assume that the observed mixture M () can be explained well using both dictionaries a ( ) and b ( ): approximating Y a M () g (a) a ( ) () +g (b) b ( ) (), or two optimally selected instances o (). In addition to being well explained, we would expect to see most o the energy o each speaker being represented by the part o this summation that includes the multinomial bases or that speaker. For both the dictionary learning and the weight estimation parts, we can use the Expectation-Maximiation algorithm to estimate any o the quantities in the above equations. The update (6) T s (a) B s X T s w (b) B s X Fig.. (a) A graphical representation o a conventional HMM. The state at each time is dependent on the state at the previous time and generates the observation. Dotted arrows indicate injection o parameters. (b) The proposed Markov selection model. The state selects the multinomial bases that generate the observation. The bases are mixed by a weight w. This additional dependence is highlighted by the dotted outline. equations or any dictionary element ( ) and its corresponding weight () or an input X () are: ( )X () () = (7), ( )X () ( )X () ( ) = (8), ( )X () where: () ( ) ( ) = ( ) ( ). (9) The dictionary o multinomial bases or each o the sources may be learned rom separate training data. These can be used to decompose mixed recordings (i.e. to ind the mixture weights () or all bases). Once the decomposition in Equation 6 is achieved, we can then recompose separately Y a () and Y b () and then invert them back to the time domain to obtain our separated estimates o y a (t) and y b (t). The perormance o this approach can vary depending on various details we do not present here, but in its better orms this has been shown to achieve suppression o unwanted speakers up to 0dB or more []. However since the objective o this paper is to recognie as opposed to separate, it would be more beneicial to use direct recognition using this model as opposed to separating and then recogniing. To do so we will incorporate this model into a hidden Markov model structure as shown in the ollowing section.. THE MARKOV SELECTION MODEL.. Model deinition In this section we will introduce an application o the model and the observations made in the previous section as applied on temporal data. A conventional Hidden Markov Model is a doublystochastic model comprising an underlying Markov chain and observation probability densities at each state in the chain. The parameters characteriing the model are the initial state probabilities Π={ (s) s} representing the probabilities o beginning a Markov chain at each state, a transition matrix T = { (s i s j) s i,s j} that represents the set o all transition probabilities between every pair o states, and a set o state output distributions B =

( ) ( ) T, T, T, ( ) ( ) o all weights to obtain the likelihood o the observation, we will use the Markov-chain-independent a posteriori state probability or all inerence and learning o Markov chain

. A two state let-to-right Markov model is shown at the top. Each state selects one pair o multinomial bases. The two bases that describe each state are shown let and right as ( i).

3 ( ) ( ) T, T, T, ( ) ( ) o all weights to obtain the likelihood o the observation, we will use the Markov-chain-independent a posteriori state probability or all inerence and learning o Markov chain parameters. Secondly, we will approximate the a posteriori state probability by the sum o a posteriori most likely mixture weights or the Multinomial bases selected by any state. I.e. we use the approximation: i X () i ( X ) ˆ ( X ) = arg max ( x) = () (0) ind (s x) = X ˆ ( X )= X () () s s Fig.. A two state let-to-right Markov model is shown at the top. Each state selects one pair o multinomial bases. The two bases that describe each state are shown let and right as ( i). The bottom o the igure displays the input X () that this model can best describe. The let part being best described as a mixture o ( ) and ( ) and the right part by ( ) and ( ). { (x s) s} representing the probability o generating observations rom each o the states. The graphical model or the HMM is shown in Figure.a. The model we propose extends this conventional model as shown in Figure.b. Instead o states generating observations directly, they generate the labels s = {} o sets o multinomial bases that will produce observations. Thus, the output distributions o the HMM are B = { ( s s) s}. To generate observations, the multinomial bases in s are mixed according to weights w. The vector o weights or all bases, w, which actually represents a multinomial over, is drawn rom a distribution which, or this paper, we will assume is uniorm. Only the bases selected by the state (and their weights, appropriately normalied) are used to generate the inal observation. Since the underlying Markov process contributes to data generation primarily by selecting bases, we reer to this as the Markov Selection Model. Figure illustrates the generation process with an example. A key aspect o the above model is that the weights w are not ixed but are themselves drawn or every observation. Further, the draw o the weights themselves is not dependent on the state in any manner, but is independent. The actual probability o an observation depends on the mixture weights. Thus, in order to compute the complete likelihood o an observation we must integrate the product o the weight-dependent likelihood o the observation and the probability o drawing the mixture weight vector over the entire probability simplex on which w resides. We note that the primary use or this model is that o inerring the underlying state sequence given this model. To do so, it is suicient to determine the Markov-chain-independent a posteriori probabilities ind (s x) o the states, and utilie those or estimating the state sequence; the actual observation probability (x s) is not required. Indeed, this observation is also utilied in several approaches to HMM-based speech recognition systems where the Markov-chain-independent a posteriori probabilities o states are obtained through models such as Neural Networks [7] or inerence o the underlying word sequence. As a irst step, instead o explicitly integrating over the space 0 where () is the same value reerred to in Equation 8. In other words, we derive the mixture weights that maximie the likelihood o the portion o the graph enclosed by the dashed outline in Figure.b. We do so without reerence to the Markov chain and utilie them to compute the markov-chain-independent conditional probabilities or states, which will be used in the inerence. This eectively actors the observation dependency and the state dependency o the model. Thus, one o the advantages conerred by the approximation is that the model o Figure.b gets actored in two parts. The irst (enclosed by the dashed outline in the igure) is essentially a plsa model that obtains w ml and thereby (). The second, given the () computed rom the irst part, is eectively an HMM with ( ) as state output densities. Inerence and learning can run largely independently in the two components, with the plsa component employed to learn its parameters, while the HMM can use the standard Baum-Welch training procedure [8] to learn the Markov chain parameters Π and T. The two components must however combine or learning the multinomial bases ( )... arameter estimation We train this structure by appropriately adapting the Baum-Welch training procedure [8] to use the new state model. Due to space constraints in this ormat, we present the learning process at a supericial level. In the irst step we compute the emission probability terms or each state. Since this is locally also a maximum likelihood estimate we estimate an intermediate value o the optimal weight vector by: () ( ) ( ) = () ( ) ( ) ( )X () () = (), ( )X () Note that the above estimation does not reer to the underlying Markov chain or its states; all computations are local to the components within the dotted outline o Figure.b. Once () has been obtained, we compute the posterior state probability (s X )= ( s) using Equation. The orward backward algorithm can then be employed as in conventional HMM. Forward probabilities α, backward probabilities β and state posteriors γ are given by the recursions: α (s) = X s α (s )T s,s ( s) β (s) = X s β +(s )T s,s +( s ) γ (s) = α (s)β (s) s α (s )β (s ). ()

4 s s w X s s Fig.. Model or producing a mixture o two sources. Each source ollows its own Markov chain. Each source s state selects some subset o multinomial bases. The mixture weights or all bases (including both sources) are independently drawn. The inal observation is generated by the weighted mixture o bases selected by both sources. In the Maximiation step we need to estimate all the dictionary elements (, i). To do so we use the state posteriors to appropriately weigh Equation 8 and obtain: ( ) = s: s γ (s) ( )X (),,s : s γ (s ) ( )X () (5) Here s : s represents the set o states which can select basis. Update rules or transition matrix T and the initial state probabilities are the same as with traditional HMM models. The most prevalent problem we encounter during training is that o strong local optima which can cause convergence towards a poor solution. This can happen or example when the multinomial bases or the terminal state adapt aster towards explaining the irst ew input time points. Since this is a potentially serious problem we need to ensure that convergence o the dictionary elements is not too rapid so that there is a signiicant likelihood that dictionary elements across states can switch i needed. This can be easily achieved by imposing an anti-sparsity prior on the activation o the dictionary elements. We do so by using a Dirichlet prior over the mixture weights or all ( ) with hyper-parameters α j slowly transitioning rom.5 to during training. This measure ensures that we obtain consistent results over multiple runs and that we do not converge on blatantly wrong local optima... Sequence Estimation The procedure or computing the optimal state sequence, given all model parameters, is straight orward. For each observation we compute the emission probability or each state through the EM estimation o Equations and Equation. The Viterbi algorithm can then be used to ind the optimal state sequence. The two examples in Figure illustrate the results obtained, both rom the learning and the state sequence estimation. For each example, a three-state model o the proposed architecture is learned. We then obtain the optimal state sequence or each data sequence using the model estimated rom it. The state segmentations obtained are shown in the bottom plots o Figure. As we can observe, the segmentation obtained is intuitive each o the states captures a locally consistent region o the data.. MODELING MIXTURES OF SOUNDS The main advantage o the proposed model becomes apparent when we use it to analye the sum o the output o two separate processes. Let X a () and X b () be two data sequences obtained separately rom two sources that are well modeled by the model o Section. Let the actual observation X () =X a ()+X b (). Then it is relatively straightorward to show that the statistical model or X () is given by Figure. As beore, each o the two sources ollows its own independent Markov chain. The state output distributions or each source are selector unctions, as in the case o the single source. The primary dierence lies in the manner in which the summed data are generated. An independent process now draws a mixture weight vector that includes mixture weights or all bases o both sources. The inal observation is obtained by the mixing o the bases selected by the states o both o the sources using the drawn mixture weights. The problem o estimating the state sequences or the individual sources is now easily solved. Using the same approximation we use in Section. we irst compute the optimal weights or all bases using iterations o Equation. The iterations compute the () or all bases rom all sources. Once these are computed, the Markov-chain-independent a posteriori state probabilities or each o the states o the Markov models or both sources are computed using Equation as ollows: (s X (i) )= X s () (6) where X (i) is the i th source at time step, s is any state in the Markov model or the i th source and s is the set o bases selected by the state. Remarkably, the model permits us compute the state emission probabilities or the individual sources, given only the sum o their outputs. The optimal state sequences or the individual source can now be independently obtained by the Viterbi algorithm. As a result, the complexity o thisprocess, given K sources, each modeled by N states is O(KN ), equivalent to perorming K independent Viterbi decodes. This is in contrast to conventional actorial approach to modeling the mixture o multiple sources, where the resulting model has N K states and Viterbi estimation o the optimal state sequence requires O(N K ) operations necessitating complex variational approximations to simpliy the problem. Figure 5 illustrates the proposed procedure. The top plot is a mixed data sequence composed as a sum o the two sequences in Figure. Ordinarily in this situation we would have to use a actorial Markov model which would consider all twelve possible combinations between both models states, and then obtain the most likely state paths using a -d Viterbi search. Using the proposed approach we obtain the individual emission scores or the states o the individual HMMs or every time instant and obtain the optimal state sequence independently or both sources. The obtained state sequences are shown in the bottom plots o Figure 5. We note that they are identical to the state sequences obtained rom the isolated sequences in Figure. Multiple runs over various inputs provide similar results with only occasional and minor dierences between the states extracted rom the isolated sequences and their sum. 5. RECOGNIZING MIXTURES OF SEAKERS In this section we present two experiments that demonstrate the use o this model in speech recognition applications. We present a small experiment that perorms digit recognition on simultaneous digit speaking by the same speaker, and a larger experiment based on the speech separation challenge data [].

5 Series Series Sum o series and estimate or HMM estimate or HMM estimate or HMM estimate or HMM Fig.. Two input patterns (top igures) and their corresponding state sequences (bottom igures) as discovered by the proposed model. Fig. 5. Sum o the patterns in Figure and the extracted state sequences as computed by the models learned on the isolated patterns. 5.. A small scale experiment 0 + mix + mix + mix +5 mix In this experiment we use digit data rom the speech separation challenge corpus to illustrate the ability o this model to discover sequences rom speech mixtures. In this experiment we chose ten utterances o ive dierent digits rom one speaker and we trained an instance o the proposed Markov model or each digit. For simplicity we used our states or all digits and three requency distributions or each state. As an input we used pre-emphasied magnitude spectra rom roughly 5ms windows. We then used an additional unknown utterance o each digit to construct a set o sound mixtures containing one digit each. We analyed these mixtures using the pre-learned digit models and examined their estimated likelihoods in order to discover which utterances were spoken in the mixture. A representative example o these results is shown or our mixture cases in Figure 6. The log likelihoods o the spoken digits were signiicantly higher than the non-spoken digits and rom them we can easily deduce the contents o the recording. In the next section we generalie this idea to a much larger and more thorough experiment. 5.. A large scale experiment In this section we describe an experiment using the speaker separation challenge data set []. We present results on both the task set orth or this challenge, but also or ull word recognition. The data in this challenge were composed o mixture recordings o two speakers simultaneously uttering sentences o a predeined structure. In the irst case we evaluate the task is to identiy a speciic word in the sentence uttered by the primary speaker, whereas in the second case we attempt to recognie all words or both utterances. The eatures we used were magnitude spectral eatures. We used a time rame o about 0ms, and a rame advance o 5ms. The magnitude spectra were preemphasied so that the higher requency content was more pronounced. We trained the proposed model or each word and each speaker using the number o states guidelines provided by the dataset documentation. We used one Log likelihood Fig. 6. Estimated model log likelihoods evaluated on mixtures o digits. Each subplot contains the log likelihoods o each digit model or a dierent digit mixture as denoted above each plot. As shown here the two highest log likelihoods coincide with the digits that were spoken in that mixture, thereby providing us with the digit recognition. requency distribution per state and trained each model or 500 iterations. The resulting models rom each speaker were then combined to orm a larger Markov model which can model an entire target sentence with equiprobable jumps between all candidate words at each section. For each mixture sentence the speaker identities were provided in advance and the two Markov models describing all the possible utterances were used to estimate the most likely state sequence or each speaker as described in the previous section. The results o these simulations are shown in tables, or the proposed task in the challenge, and or the word recognition rates or both o the simultaneous utterances. The SNR column describes the amplitude dierence between the primary and the secondary speakers. As expected the louder the primary speaker is the better results we achieve. The Same speaker column shows the results when the two utterances were recorded rom the same speaker. This is the worst case scenario since the dictionary elements in our model will have maximal overlap and the state posterior probabilities will be unreliable. We clearly see that in this case the results are the lowest we obtain. The Same gender column describes the results when the two speakers were o the same

6 gender. This is a somewhat better situation since the will be less overlap between the state dictionary elements and the results relect that by being higher. Finally the even better case is when the two speakers are o dierent gender, in which case there is a high likelihood that dictionary elements will not overlap signiicantly, thus we obtain the best results. The inal two columns present the average results o our proposed model and or a baseline comparison, the GMM Avg. label presents the average results we obtained using the same representation and a Gaussian state HMM, while treating the secondary speaker as noise. The overall results we obtain rank high in terms o previously achieved results or this task [], and come at a signiicantly lower computational cost than other approaches due to the eicient decoding scheme we introduce. SNR Same Same Di Avg. GHMM speaker gender gender Avg. 6dB 58.% 68.% 69.8% 65.% 8.0% db 6.% 6.% 6.7% 58.0% 7.% 0dB.7% 5.9% 60.5% 8.6% 9.% -db.7%.8% 5.0% 9.% 0.8% -6dB.6% 6.0% 5.7%.% 5.5% -9dB 8.7%.5% 7.0% 5.%.% Table. Detailed task results using the proposed model SNR Same Same Di Avg. speaker gender gender Clean N/A N/A N/A 88% 6dB 68% % 80% 59% 8% 70% 77% 5% db 57% % 77% 67% 80% 76% 7% 6% 0dB 6% 5% 68% 75% 76% 80% 6% 69% -db 5% 65% 6% 80% 7% 8% 55% 76% -6dB 6% 7% 5% 8% 6% 86% 7% 8% -9dB % 80% 8% 87% 57% 87% % 8% Table. Overall word recognition results using the proposed model. Let percentages are denoting the correct recognition rate or the primary speaker s words, right percentages do so or the secondary speaker. An interesting point to make here is that the representation that we used is balancing a tradeo between mixture modeling and recognition. The ine requency resolution and linear amplitude scale that we use aid in discriminating the two speakers and acilitates the additivity assumption, but it also impedes recognition since it highlights pitch and amplitude variances. In contrast to that, a speech recognition system would use a lower requency resolution that conceals pitch inormation but maintains spectral shape, and would also use that representation in the log amplitude domain so that subtle amplitude patterns can be easier to detect. Selecting the proper representation is a process that involves trading o the ability to discriminate sources and the ability to recognie, something which is application dependent. Once the state transitions have been estimated rom a mixture, it is also trivial to perorm separation o the constituent sources. Since this operation is out o the scope o this paper we deer the presentation o this experiment to uture publications. 6. CONCLUSIONS In this paper we introduced the Markov Selection Model, a new statistical model which combines models used or source separation with a Markov structure. We demonstrate how this model can learn and recognie sequences, but also perorm recognition and state estimation even when presented mixed mixed signals. Supericially this model is similar to the one in [9], but at closer inspection provides a more extensive basis model and is substantially dierent in estimation and inerence. We ormulate this model in such a way that so that it allows us to perorm state estimation on mixed sequences with linear complexity in the number o sources. This is a signiicant computational improvement as compared to similarly employed actorial Markov models, and one that doesn t sacriice perormance by a noticeable amount. This structure can also be represented as a Conditional Random Field, which can result in additional structural possibilities and a more straightorward inerence ormulation, and it can also be used or solving source separation and denoising problems. We anticipate to address these possibilities in uture work. Acknowledgements The authors acknowledge Gautham Mysore s help in the preparation o this manuscript. 7. REFERENCES [] martin/ SpeechSeparationChallenge.htm [] Raj, B.,. Smaragdis. Latent Variable Decomposition o Spectrograms or Single Channel Speaker Separation, in 005 IEEE Workshop on Applications o Signal rocessing to Audio and Acoustics (WASAA 005) [] Virtanen, T. and A. T. Cemgil. Mixtures o Gamma riors or Non-Negative Matrix Factoriation Based Speech Separation, in 8th International Conerence on Independent Component Analysis and Signal Separation (ICA 009). [] Smaragdis,., M. Shashanka, and B. Raj. A sparse nonparametric approach or single channel separation o known sounds, Neural Inormation rocessing Systems (NIS) 009. [5] Homann, T. robabilistic Latent Semantic Indexing, in 999 ACM SIGIR Special Interest Group on Inormation Retrieval Conerence (SIGIR 999). [6] Lee D.D., and H.S. Seung. Learning the parts o objects by non-negative matrix actoriation. Nature 0, 999. [7] Bourlard, H. and N. Morgan, Hybrid HMM/ANN systems or speech recognition: Overview and new research directions, LNCS, Springer Berlin, vol. 87, 998, pp [8] Rabiner, L.R. and B. H. Juang. An introduction to hidden Markov models. IEEE Acoustics, Speech and Signal rocessing (ASS) Magaine, (): 6, 986. [9] Oerov, A., C. Févotte and M. Charbit. Factorial scaled hidden Markov model or polyphonic audio representation and source separation, in 009 IEEE Workshop on Applications o Signal rocessing to Audio and Acoustics (WASAA 009).

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 11 Project