1 Acta Polytechnica Hungarica Vol., No., 04 Sybolic Analysis as Universal Tool for Deriving Properties of Non-linear Algoriths Case study of EM Algorith Vladiir Mladenović, Miroslav Lutovac, Dana Porrat The Higher Technical School of Professional Studies Neanjina, 000 Požarevac, Serbia The University Singidunu Danijelova 3, 000 Belgrade, Serbia The School of Coputer Science and Engineering Hebrew University of Jerusale Jerusale 9904, Israel Eail: Abstract: Many researchers start their work by studying theory in order to get better insight into easured phenoena. Soeties they cannot get the nueric values of paraeters used in published equations. This is even ore coplicated when the theory is statistical and there are no closed for deterinistic solutions. In this paper we introduce an original approach and ethod to analyzing a popular and frequently cited tutorial paper on Expectation-Maxiization (EM) algorith. The original paper has sufficient inforation to understand the algorith. Using tools of Coputer Algebra Syste and ethods of Sybolic Processing (SP), the typewriting errors are discovered and the rederived equations are used for autoatic generation of algorithic code. The exaples are evaluated using autoatically derived code and the final nueric values agree with the values fro the original paper. The derived results are used for further optiization, such as deriving the coputational error in the closed for. Fro the closed for solutions, the precision can be derived in ters of nuber of iterations, or the inial nuber of iterations can be expressed in ters of the required precision. This helps to optiize the algorith paraeters so that the algorith becoes ore efficient. Keywords: Sybolic processing; EM algorith; iteration; convergence; ML estiation; Coputer algebra syste 7

2 V. Mladenović et al. The Sybolic Analysis as Universal Tool for Deriving Properties of Non-linear Algoriths Introduction Coputer algebra is a field of coputer science concerned with the developent, ipleentation, and application of algoriths that anipulate and analyze expressions. Soe of the issues are developing algoriths (such as algebraic algoriths, sybolic-nueric algoriths), studying how to build coputer algebra systes (eory anageent, higher-order type systes, optiizing copilers), and atheatical knowledge anageent (representation of atheatical objects). Coputer algebra is based on well-defined seantics, copose constructions, and algebraic algoriths. Sybolic coputation is concerned with alternative fors, partially-specified doains, and sybolical derivations in general. Coputer algebra systes is used to siplify rational functions, factor polynoials, find the solutions to a syste of equations, give general forulas as answers, odel industrial atheatical probles, and various other anipulations. Sybolic processing can be treated as a transforation of expression trees, such as sybols for operations ( +, sin ), variables, constants, and siplification (for exaple, expression equivalence). The EM (Expectation-Maxiization) algorith is analysed with a new type of processing, called sybolic processing. We illustrate the ain drawbacks of nuerical tools that have led to erroneous conclusions in the application of the algorith through a practical exaple presented in []. Using sybolic processing, we derive the error function in closed for as a function of the nuber of iterations. Next, we deterine the iniu nuber of iterations required to obtain a correct result, and as well as the required nuber of iterations as a function of the nuber of accurate bits of the final result of an algorith. The ain ai of this paper is to show how Coputer Algebra (CA) and sybolic processing can be used in the design of nueric algoriths, and in discovering the properties of EM algorith. The Method of Sybolic Processing Modern research in the field of engineering science usually starts with theoretical analysis. The next step can be siulation in order to prove the theoretical assuptions. The final step is practical ipleentation and easureents. Usually, the theoretical derivations are ade on ideal paraeter values, such in infinite nuber of saples or known signal values for tie fro to. Siulation is used for siulation of systes in order to gain insight into their functioning. For a finite nuber of easured values, one can expect that the result agrees well with the siulation. 8

3 Acta Polytechnica Hungarica Vol., No., 04 Probably, the first critical step is to set the siulation paraeters and write a code that fits the theory. The second critical step is to choose the nuber of iterations or the nuber of input saples. Sybolic processing can help to error-free derive siulation code, and to find typewriting errors in published results. This paper presents such an analysis. Next, sybolic processing can be used for finding the processing errors as the closedfor expression for coputing the nuber of required iteration steps; or the error function due to finite word-length [, 3]. Therefore, sybolic processing can help gain insight into how a syste works, which is preferred to experienting with nueric siulations. Sybolic processing plays an iportant role in education. Even ore it can be used in practice because the nuerical tools ay have probles with accuracy and a significant increase in the coplexity. Priarily, the analysis area is essentially continuous and infinite [4]. This paper ais to highlight the role of sybolic processing applications, especially where the nuerical approaches fail to produce the right algoriths that are iportant for any areas of engineering research, especially in counications. Sybolic processing repairs certain disagreeent between theoretical perforance and nuerical siulations. Nuerical processing ay show unsatisfactory results [5]. It can drive the researchers to erroneous conclusions. Using the sybolic tools, it is easy to find and prove istakes and gain a better insight into the whole process of analysis, siulation and odeling. Sybolic processing works with sybols. Any sybol can be replaced by a nuber when sybolic expressions becoe ipractical, for exaple, for plotting a response to a specific excitation. Therefore, writing progras using sybolic processing can be seen as a set of instructions that anipulate the sybols and can be used to perfor a uch wider range of activities [6]. The efficiency of sybolic processing becoes ore iportant if systes and signals are ore coplex. There is no algebraic solution for all fors of processing, and sybolic processing will not be able to solve all probles. Even then, the final result can be expressed in a for of special function that can be coputed for known nuerical values of syste paraeters. In the next section, we will deonstrate the derivation of the EM algorith using sybolic processing. 9

4 V. Mladenović et al. The Sybolic Analysis as Universal Tool for Deriving Properties of Non-linear Algoriths 3 The Expectation-Maxiization Algorith The Expectation-Maxiization (EM) algorith is ideally suited to estiate the paraeters of a probability function, especially when direct access to all of the data is ipossible [5]. A brief description of the algorith can be found in literature [5] that is suitable for signal processing practitioners. An illustrative exaple is also presented there. Unfortunately, the algorithic steps there disagree with the theoretical stateents, and do not provide the nueric results given in Table of [5]. Those who want to test the knowledge learned fro the exaple ay be confused about the reason for such disagreeents. The error can be in the initial stateents, in the derived algorithic steps, or it can be typewriting error. A coputer algebra syste (specifically Matheatica) and sybolic processing are used in this paper to start analysis fro the theoretical stateents, derive the algorithic code, and calculate the final results. It is iportant to ephasize the role of the new analysis of the EM algorith because EM is the backbone for algoriths that are used in significant analyse, easureent data processing and siulations [, 7, 8]. The ain goal of EM is approxiation Maxiu Likelihood fro incoplete data. The ost general for starts fro the pdf of the incoplete data: g( y ) f ( x ) dx () ( y) where f ( x ) is the probability density function of coplete data, and is set of paraeters. An observation y deterines a subset of that is denoted as (y). The EM algorith consists of two steps: the E-step and the M-step. The E-step (Expectation step) takes an expectation with respect to the unknown variables using the current estiate of the paraeter and conditioned upon the observation. The M-step (Maxiization step) then provides a new estiation of the paraeters, in that it produces a axiu-likelihood (ML) estiates of paraeters when there is any-to-one apping. These two steps are iterated until convergence [5]. For the E-step copute: [ k] [ k] Q( ) E[log f ( x ) y, ] () where Q ( ) is Q function. 0

5 Acta Polytechnica Hungarica Vol., No., 04 For the M-step copute: [ k ] [ k] arg ax Q( ) (3) 4 An Exaple Proble The exaple proble is to find the unknown value of p using the EM algorith. In this exaple the initial value of p is defined as a sybol, and we know that its true value is / (as specified in the exaple in [9]). The application of EM to this exaple is calculated sybolically (using Matheatica), and we show atheatical description. The expressions which are derived in this paper, were not derived in the paper [5] and cannot be obtained by hand. On the other side, the estiated paraetar p is obtained with the value of 0.5 in the paper [5], but nowhere it explains the deviation fro the true value of 0.5. We obtain the exact value of 0.5 and explain the way how we get it. This section repeats Exaple fro [5]. The application of EM to this exaple is calculated sybolically (using Matheatica), and we show atheatical description. Let a set of rando variables is {X, X,, X }. For exaple X represents the nuber of round dark objects, X represents the nuber of square dark objects, and X represents the nuber of light objects. Let x, x,, x be a sequence of outcoes of the rando variables X, X,, X that have been observed. It is assued that X is independent of i X for j nuber of outcoes (total nuber of observed objects) is n, then nonnegative integers such that xi n i i j. If the x i are If X, X,, X are utually exclusive events with P( X ) x = p,..., P( X x) = p, then the probability X that occurs x ties,..., X occurs x ties is given by xi pi P( X x, X x,, X x) n! (5) x! where i i p are constants with p 0 and i i (4)

6 V. Mladenović et al. The Sybolic Analysis as Universal Tool for Deriving Properties of Non-linear Algoriths p i (6) i The joint distribution of X, X,, X is a ultinoial distribution and P( X x, X x,, X x) is given by the corresponding coefficient of the ultinoial series p n p p (7) Assue further that probabilities are also known As an exaple, assue that the objects are known to be trinoial distributed, that is 3, as in [5]. Assue further that probabilities are also known p (8) 4 p p (9) 4 4 Then, fro (6) follows p p3 p p (0) 4 We can use a notation for probability density function f ( x, x, x3 p) P( X x, X x, X 3 x3) () For the unknown paraeter p, the function becoes x x n! p p f ( x, x, x p) 3 () x! x! x! Suppose that a feature extractor is eployed that can distinguish the color of the objects, but cannot distinguish the shape. Fro the easureent, the nuber of dark objects is known, y x x, but the nuber of x or x is unknown. The basic idea of EM algorith is to find expressions that can be used for iterated coputing of x or x based on known y x x and distribution function. x3 4. Expectation Step In the E-step, the expected values of x, x, and 3 x, have to be coputed using initial estiate and easured data. The algorith is derived using the knowledge of the ultinoial distribution.

7 Acta Polytechnica Hungarica Vol., No., 04 Starting with easured y (where y x x ) we have to derive expressions that can be used in iterated algorith in order to find the nuber of objects x and x, and the paraeter p. In other words, we have to derive expected values of x and x for easured y and estiate of p in the k th iteration [ k ] [ x E[ x y, p k ] ] (3) [ k ] [ k ] x E[ x y, p ] (4) The expected values can be derived using the ultinoial distribution (assuing that x x x n P( X ) 3 x x x3 p p p3 x, X x, X 3 x3) n! (5) x! x! x! Notice that X X, ) is binoial with class probability p p, ), then ( X 3 3 ( p 3 (assuing p p p, y x n 3 3 ) the probability P( Y y, X3 x3) becoes y x3 ( p p) p3 P( Y y, X 3 x3) n! (6) y! x3! Assue that the probability of soe event x is known, say P (x), and the nuber of events can be fro 0 to n, the expectation value of x is defined by n E[ x] x P( x) (7) x 0 Noticing that y x x and 0 x y, expectation of x requires to know conditional probability P x ), ( y E [ x ] x P( x) (8) x 0 Assuing that y has occurred, the conditional probability P ( x ) equals P( X, ) ( ) x Y y P x (9) P( Y y) [ k ] Applying those definitions on x, the iterated algorith can be derived. Unfortunately, the presented relations in [5] disagree with expected. 3

8 V. Mladenović et al. The Sybolic Analysis as Universal Tool for Deriving Properties of Non-linear Algoriths 4. Maxiization Step The axiu of soe function, say f x, x, x ), can be obtained by taking ( 3 p the derivative of the function f with respect to paraeter p, equating to zero, and solving for p. Since the logarith is onotonically increasing, axiizing log( f ) is equivalent to axiizing f. The function f is known in ters of the paraeter p, and we can try to solve for p the following equation d d log p f ( x, x, x p) 0 3 (0) If solution exists, it becoes the second part of EM algorith. 5 Ipleentation of Coputer Algebra Syste Code into EM Algorith In this paper, we re-derive the expressions obtained in [5] because they do not produce the expected results in the nueric exaple presented in [5]. Coputer algebra systes (CAS), such as Matheatica [0], can be used for derivation of all equations required for algorith. The ain otive is to avoid anual derivation and thus derive all relations without error. All derivations are in a docuent called notebook that looks like technical paper in electronic for. There are cells with textual description, such as title, section head, text with forula, but also cells naed input and output. In the input cells we write expressions that can be evaluated. 5. Entering Knowledge in CAS The siplest way to re-derive equations is to start with definitions. The first line of the first input cell can be the definition of the trinoial distribution By executing the input cell there is no visible output cell. As a atter of fact, we define new knowledge that becoes a part of the CAS. The eaning of the code is that we define a function called f, a function with 6 arguents, (x, x, x3, p, p, p3), where instead of any arguent with _ we can use any other sybol 4

9 Acta Polytechnica Hungarica Vol., No., 04 or nuber (instead of x_ we can use x ). The sybol := tells to CAS to reeber the expression that follows, but without execution, and so there is no output cell. Once the knowledge is inputted to CAS, we can use for presentation or evaluation. The atheatical expressions ay look different than a code. Let us replace sybolic arguents with sybols that appear in textbooks (use x instead of x), and present in the traditional for (transfor the expression using coand //TraditionalFor). This expression still does not look sae as equation (5). Therefore, we evaluate the expression again for the sae sybolic arguents, and in the next line use the expression (% stays for previous expression) and apply substitution as in equation (4), and finally, display the result in the traditional for Now, we obtain the sae result as equation (5). This eans that inputted knowledge is visually identical as in textbook. Notice that the last output is a result of the line that follows input In[3], that is actually input In[4], and the displayed output is Out[4]. The binoial distribution can be defined in a siilar way This defines a function called f, a function with 5 arguents, (y, y, p, p, n). The head of the function is the sae as for the trinoial distribution, but the difference is in the nuber of arguents. Instead of any arguent with _ we can use any other sybol or nuber (instead of y_ we can use x, and instead of 3 p_ we can use p p ). Replacing arguents with appropriate sybols, and using the coand for displaying expression in the traditional for, equation (6) is derived 5

10 V. Mladenović et al. The Sybolic Analysis as Universal Tool for Deriving Properties of Non-linear Algoriths 5. Derivation of The Expectation Step In order to find out expectation of x, the conditional probability P ( x ) should be coputed using equation (9) In the noinator the trinoial distribution is used, while in the denoinator the binoial distribution is applied. The coand Siplify is use to find the siplest for of the expression. Next, the derived expression is used in equation (8), and the final result is not the sae as in [5] The sae procedure is used for deriving expected value of x, but this tie the sae result is as in [5] 6

11 Acta Polytechnica Hungarica Vol., No., 04 Su of expected values of x and x should be y, that is obvious fro the derived expressions using CAS, but also proves that one of two results in [5] is not correct. Now, it is obviously that the sign in the denoinator of the expectation of x is not correct in [5, Box, equation (37)]. Also, in the original text [5], it used y and y for the sae variable, that ay be also confusing for soe readers. The presented derivation in this paper shows that all results can be found out autoatically, and interediate results and defined functions can be visually identified to be the sae with those fro textbooks. Once the general expressions are derived, a particular solution follows as a special case. For exaple, if the probabilities are known, they can be denoted by subscript e The expected value of x for this specific case can be derived by substituting paraeters in the general solution for x Following the sae procedure, the expected value of x is derived 7

12 V. Mladenović et al. The Sybolic Analysis as Universal Tool for Deriving Properties of Non-linear Algoriths The derived results differ fro that in [5, equations (5) and (6)], and instead of p, the autoatically derived expressions have p /. 5.3 Derivation of The Maxiization Step For known probabilities of a specific case, the trinoial distribution can be expressed in ters of x, x, x, and the paraeter 3 p The sae expression is presented in [5]. Since this is a continuous function in p, the first derivative can be coputed The paraeter p can be expressed in ters of for the extree (and again, the sae expression is presented in [5]) But, the value of x is unknown, Instead of x, we can use the expected value of x, that was derived in the expectation step. In order to obtain the sae for of the final results as in [5], soe rearrangeents can be evaluated (to collect all ters in nuerator and denoinator that are ultiplied by p ) 8

13 Acta Polytechnica Hungarica Vol., No., 04 The final result is not the sae as in [5]. It sees that the recurrence relation in [5]. p is used instead of p in The presented derivation in this section shows that the final result can be found out autoatically following the atheatical notation fro textbooks or published papers. 5.4 Derivation of The Code Once the expressions are derived fro inputting the knowledge into CAS, the algorith in the for of code can be siply transfored. The code can be derived for soe input paraeters, such as the total nuber of objects, nuber of dark or light objects, initial guess for p, required accuracy, or probabilities The expressions used in the code is a special case of the general expressions in ters of the known paraeters or the previous value of p 9

14 V. Mladenović et al. The Sybolic Analysis as Universal Tool for Deriving Properties of Non-linear Algoriths Iterated algorith can be a while loop, in that new nuber ob objects or new p is calculated Finally, the results of iterations can be viewed as table of values Nuber of displayed digits can be also specified The values fro table are the sae as in [5]. Therefore, using CAS, we derive all expressions and code for EM algorith. Also, we avoid typewriting errors in the original text, prove all of the steps, and copute the nuerical results for a specific case. For exaple, it is obvious that y cannot be 00 [5, page 50 after equation (8)], because n 00 and y n. A typewriting error is in the derivation of expectation step [5, page 53, Box ] where index in the suation forula is fro 0. Another typewriting error is in the derivation of expectation step [5, page 49, equation (6)] where x should be used instead of x. 6 Analytical Solution of The EM Algorith The EM algorith can be further analyzed by eploying the sybolic expressions derived using CAS. Instead of the nueric analysis, a closed for expressions can be derived. This can help us to gain insight into how soe syste works and understand the properties of the algorith. 30

15 Acta Polytechnica Hungarica Vol., No., Finding Closed For Solution Firstly, we generate an equation that describes iterated coputing of paraeter p, knowing the initial value Coand RSolve can be used to derive a closed for expression for p in ters of the iteration step k and other paraeters kept as sybols In the specific case, when nueric values of soe paraeters are known, the closed for expression can be in ters of k, only The nueric values of p in each iteration step can be displayed in table forat Closed for expression can be used to find out the exact value after infinite nuber of iteration 3

16 V. Mladenović et al. The Sybolic Analysis as Universal Tool for Deriving Properties of Non-linear Algoriths Any of the paraeters fro the general solution can be changed to copute another set of values, such as for any value of n and specified exact value for p (naed p True ) Since we already have a liit value as closed for solution in ters of n, the exact values can be coputed for a range of different n Those results can be illustrated as it is shown in Fig.. The probability can be fro the range 0.45 to 0.55, depending on n. Regardless of the fact that initial value of p is /, naed p True in the code, the coputed value of is fro the range 0.45 to 0.55 as exact result of the algorith after infinite nuber of iterations. The exact value p 0. 5 can be obtained for a large nuber of objects, or as a ean value for several n. 3

17 Acta Polytechnica Hungarica Vol., No., 04 Figure Estiate value of p in ters of the nuber of observed objects n 6. Finding Nuber of Iterations k Usually, the errors in iterated algoriths are estiated. Since we already have the exact value after infinite nuber of iterations, and the current value at any iteration, the error can be expressed in closed for b In addition, we specify the acceptable error as a /, where b is the nuber of accurate bits. Solving the equation when error is equal to required nuber of bits, we can deterine b in ters of the iteration index k For tree different nuber of iteration, it follows that 4 iterations are sufficient for 8 bits, 33

18 V. Mladenović et al. The Sybolic Analysis as Universal Tool for Deriving Properties of Non-linear Algoriths Figure shows the nuber of accurate bits is liner function of nuber of iterations Figure Nuber of exact bits of estiated p vs. nuber iterations k Instead of nuber of accurate bits, we can set nuber of accurate decial digits Solving the equation when error is equal to required nuber of accurate decial digits, we can deterine inial nuber of iterations The inial nuber of iterations is a function of digits For a 6-digit precision, the inial nuber of iterations is 9 34

19 Acta Polytechnica Hungarica Vol., No., 04 Recalling the table with results, we can see that the 9 th iteration has 6-digit precision. This result can be used for fixed nuber of iterations instead of while loop. This is ore suitable for nueric algoriths, especially when cash instructions can iprove the speed of coputing (no branch at the end of set of instructions). In a siilar way, by increasing n, the axial nuber can be deterined for the required precision. For n fro 0 to 0000, 9 th iteration are sufficient for 6-digit precision. Conclusions In this paper, we analyze EM (Expectation-Maxiization) algorith using a new approach and ethod naed sybolic processing. We present a straightforward systeatic procedure for autoatically derivation of expressions starting fro the basic knowledge of the phenoena, autoatic derivation of the code, processing with the code and reevaluate the nueric results fro published papers. Even ore, a closed for solution can be derived, and sybolic optiization ay be perfored. The presented procedure can be used as teplate for sybolic processing instead of the classic nueric processing and anual derivations of new advanced challenging algoriths. We introduce an original approach to analyzing a popular and frequently cited tutorial paper on Expectation- Maxiization (EM) algorith, but with a nuber of typewriting errors. We prove that a ore elegant and accurate solution can be obtained by using sybolic processing. Also, we provide better solutions for the exact values of the unknown paraeters of probability. References [] J. A. Fessler and A. O. Hero, Space-Alternating Generalized Expectation Maxiization Algorith, IEEE Trans. Signal Processing, Vol. 4, pp , Oct. 994 [] M. D. Lutovac and D. V. Tošić, Sybolic Analysis and Design of Control Systes using Matheatica, International Journal of Control, Special Issue on Sybolic Coputing in Control, Vol. 79, No., 006, pp [3] M. Lutovac, J. Ćertić, Lj. Milić, Digital Filter Design Using Coputer Algebra Systes, Circuits Syst. Signal Process., Vol. 9, No., 00, pp [4] M. Lutovac, D. Tošić, Sybolic Signal Processing and Syste Analysis, Facta Universitatis, Electronics and Energetics, Vol. 6, No. 3, 003, pp [5] T. Moon, The Expectation-Maxiization Algorith, IEEE Signal Processing Magazine, 997, pp

20 V. Mladenović et al. The Sybolic Analysis as Universal Tool for Deriving Properties of Non-linear Algoriths [6] M. D. Lutovac, V. M. Mladenović, Developent of Aeronautical Counication Syste using Coputer Algebra Systes, Proc. XXV Siposiu of a New Technologies in Post Office and Telecounication Traffic - PosTel 007, Belgrade, Deceber 007 [7] B. H. Fleury, D. Dahlhaus, R. Heddergott, and M. Tschudin, Wideband Angle of Arrival Estiation using the SAGE Algorith, Proc. IEEE Fourth Int. Syp. Spread Spectru Techniques and Applications (ISSSTA 96), Mainz, Gerany, Sept. 996, pp [8] D. Porrat, The Diffuse Multipath Coponent and Multipath Coponent Visibility, The 5 th European Conference on Antennas and Propagation, EUR Congressi in Roe, Italy, EuCAP 0, -5 April 0 [9] M. D. Lutovac, V. M. Mladenović, Post-Processing with Advanced Sybolic Siulation and Design using ScheaticSolver and Matheatica, 9 th International Conference on Telecounications in Modern Satellite, Cable and Broadcasting Services- TELSIKS IEEE societies: MTT, AP, Counications, and Region 8, October 7-9, 009, Nis, Serbia&Montenegro [0] S. Wolfra, The Matheatica Book, Cabridge: Cabridge University Press,

More information

