Theory and Problems of Probability, Random Variables, and Random Processes

Size: px

Start display at page:

Download "Theory and Problems of Probability, Random Variables, and Random Processes"

Hugh Hensley
6 years ago
Views:

2 Schaum's Outline of Theory and Problems of Probability, Random Variables, and Random Processes Hwei P. Hsu, Ph.D. Professor of Electrical Engineering Fairleigh Dickinson University Start of Citation[PU]McGraw-Hill Professional[/PU][DP]1997[/DP]End of Citation

3 HWEI P. HSU is Professor of Electrical Engineering at Fairleigh Dickinson University. He received his B.S. from National Taiwan University and M.S. and Ph.D. from Case Institute of Technology. He has published several books which include Schaum's Outline of Analog and Digital Communications and Schaum's Outline of Signals and Systems. Schaum's Outline of Theory and Problems of PROBABILITY, RANDOM VARIABLES, AND RANDOM PROCESSES Copyright 1997 by The McGraw-Hill Companies, Inc. All rights reserved. Printed in the United States of America. Except as permitted under the Copyright Act of 1976, no part of this publication may be reproduced or distributed in any form or by any means, or stored in a data base or retrieval system, without the prior written permission of the publisher PRS PRS ISBN Sponsoring Editor: Arthur Biderman Production Supervisor: Donald F. Schmidt Editing Supervisor: Maureen Walker Library of Congress Cataloging-in-Publication Data Hsu, Hwei P. (Hwei Piao), date Schaum's outline of theory and problems of probability, random variables, and random processes / Hwei P. Hsu. p. cm. (Schaum's outline series) Includes index. ISBN Probabilities Problems, exercises, etc. 2. Probabilities- Outlines, syllabi, etc. 3. Stochastic processes Problems, exercises, etc. 4. Stochastic processes Outlines, syllabi, etc. I. Title. QA H '076 dc CIP

4 Preface The purpose of this book is to provide an introduction to principles of probability, random variables, and random processes and their applications. The book is designed for students in various disciplines of engineering, science, mathematics, and management. It may be used as a textbook and/or as a supplement to all current comparable texts. It should also be useful to those interested in the field for self-study. The book combines the advantages of both the textbook and the so-called review book. It provides the textual explanations of the textbook, and in the direct way characteristic of the review book, it gives hundreds of completely solved problems that use essential theory and techniques. Moreover, the solved problems are an integral part of the text. The background required to study the book is one year of calculus, elementary differential equations, matrix analysis, and some signal and system theory, including Fourier transforms. I wish to thank Dr. Gordon Silverman for his invaluable suggestions and critical review of the manuscript. I also wish to express my appreciation to the editorial staff of the McGraw-Hill Schaum Series for their care, cooperation, and attention devoted to the preparation of the book. Finally, I thank my wife, Daisy, for her patience and encouragement. HWEI P. HSU MONTVILLE, NEW JERSEY Start of Citation[PU]McGraw-Hill Professional[/PU][DP]1997[/DP]End of Citation

6 Contents Chapter 1. Probability Introduction Sample Space and Events Algebra of Sets The Notion and Axioms of Probability Equally Likely Events Conditional Probability Total Probability Independent Events 8 Solved Problems 9 Chapter 2. Random Variables Introduction Random Variables Distribution Functions Discrete Random Variables and Probability Mass Functions Continuous Random Variables and Probability Density Functions Mean and Variance Some Special Distributions Conditional Distributions 48 Solved Problems 48 Chapter 3. Multiple Random Variables Introduction Bivariate Random Variables Joint Distribution Functions Discrete Random Variables - Joint Probability Mass Functions Continuous Random Variables - Joint Probability Density Functions Conditional Distributions Covariance and Correlation Coefficient Conditional Means and Conditional Variances N-Variate Random Variables Special Distributions 88 Solved Problems 89 v

7 vi Chapter 4. Functions of Random Variables, Expectation, Limit Theorems Introduction Functions of One Random Variable Functions of Two Random Variables Functions of n Random Variables Expectation Moment Generating Functions Characteristic Functions The Laws of Large Numbers and the Central Limit Theorem 128 Solved Problems 129 Chapter 5. Random Processes Introduction Random Processes Characterization of Random Processes Classification of Random Processes Discrete-Parameter Markov Chains Poisson Processes Wiener Processes 172 Solved Problems 172 Chapter 6. Analysis and Processing of Random Processes Introduction Continuity, Differentiation, Integration Power Spectral Densities White Noise Response of Linear Systems to Random Inputs Fourier Series and Karhunen-Loéve Expansions Fourier Transform of Random Processes 218 Solved Problems 219 Chapter 7. Estimation Theory Introduction Parameter Estimation Properties of Point Estimators Maximum-Likelihood Estimation Bayes' Estimation Mean Square Estimation Linear Mean Square Estimation 249 Solved Problems 250

8 vii Chapter 8. Decision Theory Introduction Hypothesis Testing Decision Tests 265 Solved Problems 268 Chapter 9. Queueing Theory Introduction Queueing Systems Birth-Death Process The M/M/1 Queueing System The M/M/s Queueing System The M/M/1/K Queueing System The M/M/s/K Queueing System 285 Solved Problems 286 Appendix A. Normal Distribution 297 Appendix B. Fourier Transform 299 B.1 Continuous-Time Fourier Transform 299 B.2 Discrete-Time Fourier Transform 300 Index 303

9 Chapter 1 Probability 1.1 INTRODUCTION The study of probability stems from the analysis of certain games of chance, and it has found applications in most branches of science and engineering. In this chapter the basic concepts of probability theory are presented. 1.2 SAMPLE SPACE AND EVENTS A. Random Experiments: In the study of probability, any process of observation is referred to as an experiment. The results of an observation are called the outcomes of the experiment. An experiment is called a random experiment if its outcome cannot be predicted. Typical examples of a random experiment are the roll of a die, the toss of a coin, drawing a card from a deck, or selecting a message signal for transmission from several messages. B. Sample Space: The set of all possible outcomes of a random experiment is called the sample space (or universal set), and it is denoted by S. An element in S is called a sample point. Each outcome of a random experiment corresponds to a sample point. EXAMPLE 1.1 Find the sample space for the experiment of tossing a coin (a) once and (b) twice. (a) There are two possible outcomes, heads or tails. Thus where H and T represent head and tail, respectively. S = {H, T) (b) There are four possible outcomes. They are pairs of heads and tails. Thus S = (HH, HT, TH, TT) EXAMPLE 1.2 Find the sample space for the experiment of tossing a coin repeatedly and of counting the number of tosses required until the first head appears. Clearly all possible outcomes for this experiment are the terms of the sequence 1,2,3,.... Thus s = (1, 2, 3,...) Note that there are an infinite number of outcomes. EXAMPLE 1.3 Find the sample space for the experiment of measuring (in hours) the lifetime of a transistor. Clearly all possible outcomes are all nonnegative real numbers. That is, where z represents the life of a transistor in hours. S=(z:O<z<oo} Note that any particular experiment can often have many different sample spaces depending on the observation of interest (Probs. 1.1 and 1.2). A sample space S is said to be discrete if it consists of a finite number of

10 PROBABILITY [CHAP 1 sample points (as in Example 1.1) or countably infinite sample points (as in Example 1.2). A set is called countable if its elements can be placed in a one-to-one correspondence with the positive integers. A sample space S is said to be continuous if the sample points constitute a continuum (as in Example 1.3). C. Events: Since we have identified a sample space S as the set of all possible outcomes of a random experiment, we will review some set notations in the following. If C is an element of S (or belongs to S), then we write If S is not an element of S (or does not belong to S), then we write us A set A is called a subset of B, denoted by AcB if every element of A is also an element of B. Any subset of the sample space S is called an event. A sample point of S is often referred to as an elementary event. Note that the sample space S is the subset of itself, that is, S c S. Since S is the set of all possible outcomes, it is often called the certain event. EXAMPLE 1.4 Consider the experiment of Example 1.2. Let A be the event that the number of tosses required until the first head appears is even. Let B be the event that the number of tosses required until the first head appears is odd. Let C be the event that the number of tosses required until the first head appears is less than 5. Express events A, B, and C. 1.3 ALGEBRA OF SETS A. Set Operations: I. Equality: Two sets A and B are equal, denoted A = B, if and only if A c B and B c A. 2. Complementation : Suppose A c S. The complement of set A, denoted A, is the set containing all elements in S but not in A. 3. Union: A= {C: C: E Sand $ A) The union of sets A and B, denoted A u B, is the set containing all elements in either A or B or both. 4. Intersection: The intersection of sets A and B, denoted A n B, is the set containing all elements in both A and B.

11 CHAP. 1) PROBABILITY The set containing no element is called the null set, denoted 0. Note that 6. Disjoint Sets: Two sets A and B are called disjoint or mutually exclusive if they contain no common element, that is, if A n B = 0. The definitions of the union and intersection of two sets can be extended to any finite number of sets as follows: n UA~=A, u A,U..-U A, i= 1 = ([: [E Al or [ E AZ or.-- E A,) = (5: 5 E Al and 5 E A, and 5 E A,) Note that these definitions can be extended to an infinite number of sets: In our definition of event, we state that every subset of S is an event, including S and the null set 0. Then If A and B are events in S, then S = the certain = the impossible event 2 = the event that A did not occur A u B = the event that either A or B or both occurred A n B = the event that both A and B occurred Similarly, if A,, A,,..., A, are a sequence of events in S, then n U A, = the event that at least one of the A, occurred; i= 1 n Ai = the event that all of the A, occurred. i= 1 B. Venn Diagram: A graphical representation that is very useful for illustrating set operation is the Venn diagram. For instance, in the three Venn diagrams shown in Fig. 1-1, the shaded areas represent, respectively, the events A u B, A n B, and A. The Venn diagram in Fig indicates that B c A and the event A n B is shown as the shaded area.

12 PROBABILITY [CHAP 1 (tr) Shaded region: A u H (h) Shaded region: A n B (I.) Shaded region: A Fig. 1-1 R c A Shaded region: A n R Fig. 1-2 C. Identities: By the above set definitions or reference to Fig. 1-1, we obtain the following identities: S=@ B=s J=A The union and intersection operations also satisfy the following laws: Commutative Laws: Associative Laws:

13 CHAP. 11 PROBABILITY Distributive Laws: De Morgan's Laws: These relations are verified by showing that any element that is contained in the set on the left side of the equality sign is also contained in the set on the right side, and vice versa. One way of showing this is by means of a Venn diagram (Prob. 1.13). The distributive laws can be extended as follows: Similarly, De Morgan's laws also can be extended as follows (Prob. 1.17): 1.4 THE NOTION AND AXIOMS OF PROBABILITY An assignment of real numbers to the events defined in a sample space S is known as the probability measure. Consider a random experiment with a sample space S, and let A be a particular event defined in S. A. Relative Frequency Definition: Suppose that the random experiment is repeated n times. If event A occurs n(a) times, then the probability of event A, denoted P(A), is defined as where n(a)/n is called the relative frequency of event A. Note that this limit may not exist, and in addition, there are many situations in which the concepts of repeatability may not be valid. It is clear that for any event A, the relative frequency of A will have the following properties: n(a)/n I 1, where n(a)/n = 0 if A occurs in none of the n repeated trials and n(a)/n = 1 if A occurs in all of the n repeated trials. 2. If A and B are mutually exclusive events, then

14 PROBABILITY [CHAP 1 and B. Axiomatic Definition: Let S be a finite sample space and A be an event in S. Then in the axiomatic definition, the, probability P(A) of the event A is a real number assigned to A which satisfies the following three axioms : Axiom 1 : P(A) 2 0 (1.21) Axiom 2: P(S) = 1 (1.22) Axiom 3: P(A u B) = P(A) + P(B) if A n B = 0 (1.23) If the sample space S is not finite, then axiom 3 must be modified as follows: Axiom 3': If A,, A,,... is an infinite sequence of mutually exclusive events in S (Ai n Aj = 0 for i # j), then These axioms satisfy our intuitive notion of probability measure obtained from the notion of relative frequency. C. Elementary Properties of Probability: By using the above axioms, the following useful properties of probability can be obtained: 6. If A,, A,,..., A, are n arbitrary events in S, then -...(-1)"-'P(A1 n A, n --.n A,) (1.30) where the sum of the second term is over all distinct pairs of events, that of the third term is over all distinct triples of events, and so forth. 7. If A,, A,,..., A, is a finite sequence of mutually exclusive events in S (Ai n Aj = 0 for i # j), then and a similar equality holds for any subcollection of the events. Note that property 4 can be easily derived from axiom 2 and property 3. Since A c S, we have

15 CHAP. 11 PROBABILITY Thus, combining with axiom 1, we obtain Property 5 implies that since P(A n B) 2 0 by axiom 1. 0 < P(A) 5 1 P(A u B) I P(A) + P(B) 1.5 EQUALLY LIKELY EVENTS A. Finite Sample Space: Consider a finite sample space S with n finite elements where ti's are elementary events. Let P(ci) = pi. Then 3. If A = u &, where I is a collection of subscripts, then if1 B. Equally Likely Events: When all elementary events (5, (i = 1,2,..., n) are equally likely, that is, p1 = p2 = " * -- Pn then from Eq. (1.35), we have and where n(a) is the number of outcomes belonging to event A and n is the number of sample points in S. 1.6 CONDITIONAL PROBABILITY A. Definition : The conditional probability of an event A given event B, denoted by P(A I B), is defined as where P(A n B) is the joint probability of A and B. Similarly,

16 8 PROBABILITY [CHAP 1 is the conditional probability of an event B given event A. From Eqs. (1.39) and (1.40), we have P(A n B) = P(A I B)P(B) = P(B I A)P(A) (1.41 ) Equation (1.dl) is often quite useful in computing the joint probability of events. B. Bayes' Rule: From Eq. (1.41) we can obtain the following Bayes' rule: 1.7 TOTAL PROBABILITY The events A,, A,,..., A, are called mutually exclusive and exhaustive if Let B be any event in S. Then n U Ai = A, u A, u v A, = S and A, n Aj i # j i= 1 which is known as the total probability of event B (Prob. 1.47). Let A = Ai in Eq. (1.42); then, using Eq. (1.44), we obtain Note that the terms on the right-hand side are all conditioned on events Ai, while the term on the left is conditioned on B. Equation (1.45) is sometimes referred to as Bayes' theorem. 1.8 INDEPENDENT EVENTS Two events A and B are said to be (statistically) independent if and only if It follows immediately that if A and B are independent, then by Eqs. (1.39) and (1.40), P(A I B) = P(A) and P(B I A) = P(B) (1.47) If two events A and B are independent, then it can be shown that A and B are also independent; that is (Prob. 1.53), Then Thus, if A is independent of B, then the probability of A's occurrence is unchanged by information as to whether or not B has occurred. Three events A, B, C are said to be independent if and only if (1 SO)

17 CHAP. 11 PROBABILITY We may also extend the definition of independence to more than three events. The events A,, A,,..., A, are independent if and only if for every subset (A,,, A,,,..., A,,) (2 5 k 5 n) of these events, P(Ail n A,, n.. n Aik) = P(Ai1)P(Ai,) P(Aik) (1.51) Finally, we define an infinite set of events to be independent if and only if every finite subset of these events is independent. To distinguish between the mutual exclusiveness (or disjointness) and independence of a collection of events we summarize as follows: 1. If (A,, i = 1,2,..., n} is a sequence of mutually exclusive events, then P( i) A,) = P(AJ i= 1 i= 1 2. If {A,, i = 1,2,..., n) is a sequence of independent events, then and a similar equality holds for any subcollection of the events. Solved Problems SAMPLE SPACE AND EVENTS 1.1. Consider a random experiment of tossing a coin three times. (a) Find the sample space S, if we wish to observe the exact sequences of heads and tails obtained. (b) Find the sample space S, if we wish to observe the number of heads in the three tosses. (a) The sampling space S, is given by S, = (HHH, HHT, HTH, THH, HTT, THT, TTH, TTT) where, for example, HTH indicates a head on the first and third throws and a tail on the second throw. There are eight sample points in S,. (b) The sampling space S, is given by Sz = (0, 1, 2, 3) where, for example, the outcome 2 indicates that two heads were obtained in the three tosses. The sample space S, contains four sample points Consider an experiment of drawing two cards at random from a bag containing four cards marked with the integers 1 through 4. (a) Find the sample space S, of the experiment if the first card is replaced before the second is drawn. (b) Find the sample space S, of the experiment if the first card is not replaced. (a) The sample space S, contains 16 ordered pairs (i, J], 1 I i 14, 1 5 j 5 4, where the first number indicates the first number drawn. Thus, [(l, 1) (1, 2) (1, 3) (1,4))

18 PROBABILITY [CHAP 1 (b) The sample space S, contains 12 ordered pairs (i, j), i # j, 1 I i I 4, 1 I j I 4, where the first number indicates the first number drawn. Thus, (1, 2) (1, 3) (1, 4) (2, 1) (2, 3) (2, 4) (3, 1) (3, 2) (37 4) (4, 1) (4, 2) (4, 3) 1.3. An experiment consists of rolling a die until a 6 is obtained. (a) Find the sample space S, if we are interested in all possibilities. (b) Find the sample space S, if we are interested in the number of throws needed to get a 6. (a) The sample space S, would be where the first line indicates that a 6 is obtained in one throw, the second line indicates that a 6 is obtained in two throws, and so forth. (b) In this case, the sample space S, is S, = (i: i 2 1) = (1, 2, 3,...) where i is an integer representing the number of throws needed to get a Find the sample space for the experiment consisting of measurement of the voltage output v from a transducer, the maximum and minimum of which are + 5 and - 5 volts, respectively. A suitable sample space for this experiment would be 1.5. An experiment consists of tossing two dice. (a) Find the sample space S. (b) Find the event A that the sum of the dots on the dice equals 7. (c) Find the event B that the sum of the dots on the dice is greater than 10. (d) Find the event C that the sum of the dots on the dice is greater than 12. (a) For this experiment, the sample space S consists of 36 points (Fig. 1-3): S=((i,j):i,j=l,2,3,4,5,6) where i represents the number of dots appearing on one die and j represents the number of dots appearing on the other die. (b) The event A consists of 6 points (see Fig. 1-3): (c) The event B consists of 3 points (see Fig. 1-3): A = ((1, 6), (2, 51, (3, 4), (4, 31, (5, 2), (6, 1)) (d) The event C is an impossible event, that is, C = 12(.

19 CHAP. 1) PROBABILITY A Fig An automobile dealer offers vehicles with the following options: (a) With or without automatic transmission (b) With or without air-conditioning (c) With one of two choices of a stereo system (d) With one of three exterior colors, If the sample space consists of the set of all possible vehicle types, what is the number of outcomes in the sample space? The tree diagram for the different types of vehicles is shown in Fig From Fig. 1-4 we see that the number of sample points in S is 2 x 2 x 2 x 3 = 24. Transmission Automatic Air-conditioning Stereo Color Fig State every possible event in the sample space S = {a, b, c, d). There are z4 = 16 possible events in S. They are 0; {a), (b), {c), {d) ; {a, b), {a, c), {a, d), {b, c), {b, d), (c, d) ; {a, b, c), (a, b, 4, (a, c, d), {b, c, d) ; S = {a, b, c, dl How many events are there in a sample space S with n elementary events?..., s,). Let Q be the family of all subsets of S. (a is sometimes referred to as the power Let S = {s,, s,, set of S.) Let Si be the set consisting of two statements, that is, Si = (Yes, the si is in; No, the s, is not in) Then Cl can be represented as the Cartesian product n = s, x s, x... x s, = ((s,, s2,..., s,): si E Si for i = 1, 2,..., n)

20 PROBABILITY [CHAP 1 Since each subset of S can be uniquely characterized by an element in the above Cartesian product, we obtain the number of elements in Q by n(q) = n(s,)n(s,) - -. n(s,) = 2" where n(si) = number of elements in Si = 2. An alternative way of finding n(q) is by the following summation: n(ql= (y) = i=o The proof that the last sum is equal to 2" is not easy. " nl i=o i!(n - i)! ' ALGEBRA OF SETS 1.9. Consider the experiment of Example 1.2. We define the events A = {k: k is odd) B={k:4<k17) C = {k: 1 5 k 5 10) where k is the number of tosses required until the first H (head) appears. Determine the events A, B,C,Au B,BuC,An B,AnC,BnC,andAn B. = (k: k is even) = (2, 4, 6,...) B = {k: k = 1, 2, 3 or k 2 8) C= (k: kr 11) A u B = {k: k is odd or k = 4, 6) BuC=C A n B = (5, 7) A n C = {I, 3, 5, 7, 9) BnC=B A n B = (4, 6) The sample space of an experiment is the real line expressed as (a) Consider the events A, = {v: 0 S v < $1 A, = {v: f 5 V < $1 Determine the events (b) Consider the events U Ai and A, i= 1 i= 1 B, = {v: v 5 1 B, = {v: v < 3)

21 CHAP. 11 PROBABILITY Determine the events (a) It is clear that U B, and OB, i= 1 i= 1 Noting that the Ai's are mutually exclusive, we have (b) Noting that B, 3 B, =,... 3 Bi 3..., we have w 00 U B~ = B, = {u: u I 3) and 0 B, = {v: u r; 0) i= 1 i= Consider the switching networks shown in Fig Let A,, A,, and A, denote the events that the switches s,, s,, and s, are closed, respectively. Let A,, denote the event that there is a closed path between terminals a and b. Express A,, in terms of A,, A,, and A, for each of the networks shown. (b) Fig. 1-5 (4 From Fig. 1-5(a), we see that there is a closed path between a and b only if all switches s,, s,, and s, are closed. Thus, A, = A, n A, (3 A, From Fig. 1-5(b), we see that there is a closed path between a and b if at least one switch is closed. Thus, A,, = A, u A, v A, From Fig. 1-5(c), we see that there is a closed path between a and b if s, and either s, or s, are closed. Thus, Using the distributive law (1.12), we have A,, = A, n (A, v A,) A,, = (A1 n A,) u (A, n A,) which indicates that there is a closed path between a and b if s, and s, or s, and s, are closed. From Fig. 1-5(d), we see that there is a closed path between a and b if either s, and s, are closed or s, is closed. Thus A, = (A, n A,) u A3

22 PROBABILITY [CHAP Verify the distributive law (1.1 2). Let s E [A n (B u C)]. Then s E A and s E (B u C). This means either that s E A and s E B or that s E A and s E C; that is, s E (A n B) or s E (A n C). Therefore, A n (B u C)c [(A n B) u (A n C)] Next, let s E [(A n B) u (A n C)]. Then s E A and s E B or s E A and s E C. Thus s E A and (s E B or s E C). Thus, Thus, by the definition of equality, we have Using a Venn diagram, repeat Prob [(A n B) u (A n C)] c A n (B u C) A n (B u C)= (A n B) u (A n C) Figure 1-6 shows the sequence of relevant Venn diagrams. Comparing Fig. 1-6(b) and 1-6(e), we conclude that (u) Shaded region: H u C' (h) Shaded region: A n (B u C) (c) Shaded region: A n H ((1) Shaded region: A n C (r) Shaded region: (A n H ) u (A n C) Fig Let A and B be arbitrary events. Show that A c B if and only if A n B = A. "If" part: We show that if A n B = A, then A c B. Let s E A. Then s E (A n B), since A = A n B. Then by the definition of intersection, s E B. Therefore, A c B. "Only if" part : We show that if A c B, then A n B = A. Note that from the definition of the intersection, (A n B) c A. Suppose s E A. If A c B, then s E B. So s E A and s E B; that is, s E (A n B). Therefore, it follows that A c (A n B). Hence, A = A n B. This completes the proof Let A be an arbitrary event in S and be the null event. Show that (a) A u ~ = A (b) A n D = 0

23 PROBABILITY 15 A u %=(s:s~aors~(a) But, by definition, there are no s E (a. Thus, AU@=(S:SEA)=A An0={s:s~Aands~@) But, since there are no s E (a, there cannot be an s such that s E A and s E 0. Thus, An@=@ Note that Eq. (1.55) shows that (a is mutually exciusive with every other event and including with itself Show that the null (or empty) set is a subset of every set A. From the definition of intersection, it follows that (A n B) c A and (A n B) c B for any pair of events, whether they are mutually exclusive or not. If A and B are mutually exclusive events, that is, A n B = a, then by Eq. (1.56) we obtain Therefore, for any event A, that is, 0 is a subset of every set A Verify Eqs. (1.1 8) and (1.1 9). Suppose first that s E A, then s I$ U A,. (aca and (a c B (1.57) (1 ) (1.58) That is, if s is not contained in any of the events A,, i = 1, 2,..., n, then s is contained in Ai for all i = 1, 2,..., n. Thus Next, we assume that Then s is contained in A, for all i = 1,2,..., n, which means that s is not contained in Ai for any i = 1, 2,..., n, implying that Thus, This proves Eq. (1.18). Using Eqs. (1.l8) and (1.3), we have Taking complements of both sides of the above yields which is Eq. (1.l9).

24 16 PROBABILITY [CHAP 1 THE NOTION AND AXIOMS OF PROBABILITY Using the axioms of probability, prove Eq. (1.25). We have S=AuA and AnA=@ Thus, by axioms 2 and 3, it follows that P(S) = 1 = P(A) + P(A) from which we obtain P(A) = 1 - P(A) Verify Eq. (1.26). From Eq. (1 Z), we have P(A) = 1 - P(A) Let A Then, by Eq. (1.2), A = S, and by axiom 2 we obtain P(@)=l-P(S)=l-1= Verify Eq. (1.27). Let A c B. Then from the Venn diagram shown in Fig. 1-7, we see that B=Au(AnB) and An(AnB)=@ Hence, from axiom 3, P(B) = P(A) + P(A n B) However, by axiom 1, P(A n B) 2 0. Thus, we conclude that P(A)IP(B) ifacb Verify Eq. (1.29). Shaded region: A n B Fig. 1-7 From the Venn diagram of Fig. 1-8, each of the sets A u B and B can be represented, respectively, as a union of mutually exclusive sets as follows: Thus, by axiom 3, AuB=Au(An B) and B=(AnB)u(AnB) P(A u B) = P(A) + P(A n B) and P(B) = P(A n B) + P(A n B) From Eq. (l.61), we have Substituting Eq. (1.62) into Eq. (1.60), we obtain P(A n B) = P(B) - P(A n B) P(A u B) = P(A) + P(B) - P(A n B)

25 CHAP. 11 PROBABILITY Shaded region: A n B Fig. 1-8 Shaded region: A n B Let P(A) = 0.9 and P(B) = 0.8. Show that P(A n B) From Eq. (l.29), we have P(A n B) = P(A) + P(B) - P(A u B) By Eq. (l.32), 0 I P(A u B) I 1. Hence P(A r\ B) 2 P(A) + P(B) - 1 Substituting the given values of P(A) and P(B) in Eq. (1.63), we get P(A n B) = 0.7 Equation (1.63) is known as Bonferroni's inequality Show that P(A) = P(A n B) + P(A n B) From the Venn diagram of Fig. 1-9, we see that A = (A n B) u (A n B) and (A n B) n (A n B) = 0 Thus, by axiom 3, we have P(A) = P(A n B) + P(A n B) AnB AnB Fig Given that P(A) = 0.9, P(B) = 0.8, and P(A n B) = 0.75, find (a) P(A u B); (b) P(A n B); and (c) P(A n B). (a) By Eq. (1.29), we have (b) By Eq. (1.64) (Prob. 1.23), we have P(A u B) = P(A) + P(B) - P(A n B) =: = 0.95 P(A n B) = P(A) - P(A n B) = = 0.15 (c) By De Morgan's law, Eq. (1.14), and Eq. (1.25) and using the result from part (a), we get P(A n B) = P(A u B) = 1 - P(A u B) = = 0.05

26 PROBABILITY [CHAP For any three events A,, A,, and A,, show that P(Al u A, u A,) = P(Al) + P(A,) + P(A,) - P(A, n A,) - P(Al n A,) - P(A, n A,) + P(Al n A, n A,) Let B = A, u A,. By Eq. (1.29), we have Using distributive law (1.1 2), we have Applying Eq. (1.29) to the above event, we obtain A, n B = A, n (A, u A,) = (A, n A,) u (A, n A,) P(Al n B) = P(Al n A,) + P(Al n A,) - P[(Al n A,) n (A, n A,)] ="P(Al n A,) + P(Al n A,) - P(Al n A, n A,) Applying Eq. (1.29) to the set B = A, u A,, we have P(B) = P(A, u A,) = P(A,) + P(A,) - P(A, n A,) Substituting Eqs. (1.69) and (1.68) into Eq. (1.67), we get P(Al u A, u A,) = P(Al) + P(A,) + P(A,) - P(A, n A,) - P(A, n A,) - P(A, n A,) + P(Al n A, n A,) Prove that which is known as Boole's inequality. We will prove Eq. (1.TO) by induction. Suppose Eq. (1.70) is true for n = k. Then Thus Eq. (1.70) is also true for n = k + 1. By Eq. (1.33), Eq. (1.70) is true for n = 2. Thus, Eq. (1.70) is true for n Verify Eq. (1.31). Again we prove it by induction. Suppose Eq. (1.31) is true for n = k. Then Using the distributive law (1.1 6), we have

27 CHAP. 11 PROBABILITY since A, n Aj for i # j. Thus, by axiom 3, we have which indicates that Eq. (1.31) is also true for n = k + 1. By axiom 3, Eq. (1.31) is true for n = 2. Thus, it is true for n 2 2, A sequence of events {A,, n 2 1 ) is said to be an increasing sequence if [Fig. 1-10(a)] A, c A2 c c A, c Ak+l c whereas it is said to be a decreasing sequence if [Fig. 1-10(b)] If (A,, n 2 1) is an increasing sequence of events, we define a new event A, by CC, A, = lim A, = U A, n+co i= 1 Similarly, if (A,, n 2 1 ) is a decreasing sequence of events, we define a new event A, by 02 A, = lim A, = r) n+w i= 1 Show that if' {An, n 2 1) is either an increasing or a decreasing sequence of events, then lim P(A,) = P(A,) n-rn which is known as the continuity theorem of probability. If (A,, n 2 1) is an increasing sequence of events, then by definition Now, we define the events B,, n 2 1, by Thus, B, consists of those elements in A, that are not in any of the earlier A,, k < n. From the Venn diagram shown in Fig. 1-11, it is seen that B, are mutually exclusive events such that n n a, 00 U Bi = U A, for all n 2 1, and U B, = U A, = A, i=l i=l i=l i=l

28 PROBABILITY [CHAP 1 A, n A, A3 (7x2 Fig Thus, using axiom 3', we have n = lim z P(B,) = lirn P n+m Next, if (A,, n 2 1) is a decreasing sequence, then {A,,, n 2 1) is an increasing sequence. Hence, by Eq. (1.73, we have From Eq. (1.1 9), I ) ) Thus, P Ai = lim P(An) I-m Using Eq. (1.25), Eq. (1.76) reduces to Thus, Combining Eqs. (1.75) and (1.77), we obtain Eq. (1.74). EQUALLY LIKELY EVENTS Consider a telegraph source generating two symbols, dots and dashes. We observed that the dots were twice as likely to occur as the dashes. Find the probabilities of the dot's occurring and the dash's occurring.

29 CHAP. 11 PROBABILITY * From the observation, we have P(dot) = 2P(dash) Then, by Eq. (1.39, P(dot) + P(dash) = 3P(dash) = 1 Thus, P(dash) = 5 and P(dot) = The sample space S of a random experiment is given by S = {a, b, c, d] with probabilities P(a) = 0.2, P(b) = 0.3, P(c) = 0.4, and P(d) = 0.1. Let A denote the event {a, b), and B the event {b, c, d). Determine the following probabilities: (a) P(A); (b) P(B); (c) P(A); (d) P(A u B); and (e) P(A n B). Using Eq. (1.36), we obtain (a) P(A) = P(u) + P(b) = = 0.5 (b) P(B) = P(b) + P(c) + P(d) = = 0.8 (c) A = (c, d); P(4 = P(c) + P(d) = = 0.5 (d) A u B = {a, b, c, d) = S; P(A u B) = P(S) = 1 (e) A n B=(b};P(A n B)= P(b)=O An experiment consists of observing the sum of the dice when two fair dice are thrown (Prob. 1.5). Find (a) the probability that the sum is 7 and (b) the probability that the sum is greater than Let rij denote the elementary event (sampling point) consisting of the following outcome: cij = (i, j), where i represents the number appearing on one die and j represents the number appearing on the other die. Since the dice are fair, all the outcomes are equally likely. So P(rij) = &. Let A denote the event that the sum is 7. Since the events rij are mutually exclusive and from Fig. 1-3 (Prob. IS), we have P(A) = K 16 u (25 u (34 u C43 u (52 u (6,) = p(r16) + P(C25) + p(c34) + P(c421) + p(c52) + p(661) = 6(&) = 4 Let B denote the event that the sum is greater than 10. Then from Fig. 1-3, we obtain There are n persons in a room. P(B) = P(556 u c65 u (66) = PG6) -1 P(C65) + W66) = 3(&) = (a) What is the probability that at least two persons have the same birthday? (b) Calculate this probability for n = 50. (c) How large need n be for this probability to be greater than 0.5? (a) As each person can have his or her birthday on any one of 365 days (ignoring the possibility of February 29), there are a total of (365)" possible outcomes. Let A be the event that no two persons have the same birthday. Then the number of outcomes belonging to A is Assuming that each outcome is equally likely, then by Eq. (1.38),

30 22 PROBABILITY [CHAP 1 Let B be the event that at least two persons have the same birthday. Then B = 2 and by Eq. (1.25), P(B) = 1 - P(A). (b) Substituting n = 50 in Eq. (1.78), we have (c) From Eq. (1.78), when n = 23, we have P(A) z 0.03 and P(B) z = 0.97 P(A) x and P(B) = 1 - P(A) w That is, if there are 23 persons in a room, the probability that at least two of them have the same birthday exceeds A committee of 5 persons is to be selected randomly from a group of 5 men and 10 women. (a) Find the probability that the committee consists of 2 men and 3 women. (b) Find the probability that the committee consists of all women. (a) The number of total outcomes is given by It is assumed that "random selection" means that each of the outcomes is equally likely. Let A be the event that the committee consists of 2 men and 3 women. Then the number of outcomes belonging to A is given by Thus, by Eq. (l.38), (b) Let B be the event that the committee consists of all women. Then the number of outcomes belonging to B is Thus, by Eq. (l.38), Consider the switching network shown in Fig It is equally likely that a switch will or will not work. Find the probability that a closed path will exist between terminals a and b. Fig. 1-12

31 CHAP. 11 PROBABILITY 2 3 Consider a sample space S of which a typical outcome is (1,0,0, I), indicating that switches 1 and 4 are closed and switches 2 and 3 are open. The sample space contains 24 = 16 points, and by assumption, they are equally likely (Fig. 1-13). Let A,, i = 1, 2, 3, 4 be the event that the switch si is closed. Let A be the event that there exists a closed path between a and b. Then Applying Eq. (1 JO), we have A = A, u (A, n A,) u (A2 n A,) Now, for example, the event A, n A, contains all elementary events with a 1 in the second and third places. Thus, from Fig. 1-13, we see that Thus, n(a,) = 8 n(a2 n A,) = 4 n(a2 n A4) = 4 n(a, n A, n A,) = 2 n(a, n A, n A,) = 2 n(a, n A, n A,) = 2 n(a, n A, n A, n A,) = 1 Fig Consider the experiment of tossing a fair coin repeatedly and counting the number of tosses required until the first head appears. (a) Find the sample space of the experiment. (b) Find the probability that the first head appears on the kth toss.

32 PROBABILITY [CHAP 1 Verify that P(S) = 1. The sample space of this experiment is where e, is the elementary event that the first head appears on the kth toss. Since a fair coin is tossed, we assume that a head and a tail are equally likely to appear. Then P(H) = P(T) = $. Let Since there are 2k equally likely ways of tossing a fair coin k times, only one of which consists of (k - 1) tails following a head we observe that Using the power series summation formula, we have Consider the experiment of Prob (a) Find the probability that the first head appears on an even-numbered toss. (b) Find the probability that the first head appears on an odd-numbered toss. (a) Let A be the event "the first head appears on an even-numbered toss." Then, by Eq. (1.36) and using Eq. (1.79) of Prob. 1.35, we have (b) Let B be the event "the first head appears on an odd-numbered toss." Then it is obvious that B = 2. Then, by Eq. (1.25), we get As a check, notice that CONDITIONAL PROBABILITY Show that P(A I B) defined by Eq. (1.39) satisfies the three axions of a probability, that is, P(A(B) 2 0 P(S I B) = 1 P(A, u A, I B) = P(A, I B) + P(A, I B) if A, n A, = 0 From definition (1.39), By axiom 1, P(A n B) 2 0. Thus, P(AIB) 2 0

33 CHAP. 11 PROBABILITY (b) By Eq. (IS), S n B = B. Then (c) By definition (1.39), Now by Eqs. (1.8) and (1.1 I), we have (A, u A,) n B=(Al n B) u (A, n B) and A, n A, = 0 implies that (A, n B) n (A, n B) = 0. Thus, by axiom 3 we get Find P(A I B) if (a) A n B = a, (b) A c B, and (c) B c A. (a) If A n B = 0, then P(A n B) = P (0) = 0. Thus, (b) If A c B, then A n B = A and (c) If B c A, then A n B= Band Show that if P(A B) > P(A), then P(B I A) > P(B). P(A n B) If P(A I B) = > P(A), then P(A n B) > P(A)P(B). Thus, P(B) Consider the experiment of throwing the two fair dice of Prob behind you; you are then informed that the sum is not greater than 3. (a) Find the probability of the event that two faces are the same without the information given. (b) Find the probability of the same event with the information given. (a) Let A be the event that two faces are the same. Then from Fig. 1-3 (Prob. 1.5) and by Eq. (1.38), we have and A = {(i, i): i = 1, 2,..., 6)

34 PROBABILITY [CHAP 1 (b) Let B be the event that the sum is not greater than 3. Again from Fig. 1-3, we see that and B = {(i, j): i + j 5 3) = {(I, I), (1, 21, (2, I)} Now A n B is the event that two faces are the same and also that their sum is not greater than 3. Thus, Then by definition (1.39), we obtain Note that the probability of the event that two faces are the same doubled from 8 to 4 with the information given. Alternative Solution: There are 3 elements in B, and 1 of them belongs to A. Thus, the probability of the same event with the information given is Two manufacturing plants produce similar parts. Plant 1 produces 1,000 parts, 100 of which are defective. Plant 2 produces 2,000 parts, 150 of which are defective. A part is selected at random and found to be defective. What is the probability that it came from plant 1? Let B be the event that "the part selected is defective," and let A be the event that "the part selected came from plant 1." Then A n B is the event that the item selected is defective and came from plant 1. Since a part is selected at random, we assume equally likely events, and using Eq. (1.38), we have Similarly, since there are 3000 parts and 250 of them are defective, we have By Eq. (1.39), the probability that the part came from plant 1 is Alternative Solution : There are 250 defective parts, and 100 of these are from plant 1. Thus, the probability that the defective part came from plant 1 is # = A lot of 100 semiconductor chips contains 20 that are defective. Two chips are selected at random, without replacement, from the lot. (a) What is the probability that the first one selected is defective? (b) What is the probability that the second one selected is defective given that the first one was defective? (c) What is the probability that both are defective?

35 CHAP. 1) PROBABILITY (a) Let A denote the event that the first one selected is defective. Then, by Eq. (1.38), (b) P(A) = = 0.2 Let B denote the event that the second one selected is defective. After the first one selected is defective, there are 99 chips left in the lot with 19 chips that are defective. Thus, the probability that the second one selected is defective given that the first one was defective is (c) By Eq. (l.41), the probability that both are defective is A number is selected at random from (1, 2,..., 100). Given that the number selected is divisible by 2, find the probability that it is divisible by 3 or 5. Let A, = event that the number is divisible by 2 A, = event that the number is divisible by 3 A, = event that the number is divisible by 5 Then the desired probability is - P(A3 n A,) + P(A, n A,) - P(A3 n As n A,) P(A 2) Now A, n A, = event that the number is divisible by 6 A, n A, = event that the number is divisible by 10 A, n A, n A, = event that the number is divisible by 30 CE~. (1.29)1 and P(A, n A,) = AS n A21 = 7% P(A, n As n A,) = &, Thus, - P(A3 u As I A21 = loo 50 Z O + Ah -hi Let A,,A,,..., A,beeventsinasamplespaceS. Show that P(A1 n A, n. n A,) = P(A,)P(A, 1 A,)P(A, I A, n A,). P(A, ( A, n A, n.. n A,-,) We prove Eq. (1.81) by induction. Suppose Eq. (1.81) is true for n = k: P(Al n A, n.. n A,) = P(Al)P(A2 I A,)P(A, I A, n A:,).-. P(A, I A, n A, n - - n A,-,) Multiplying both sides by P(A,+, I A, n A, n... n A,), we have n A,) = P(Al n A, n -.. n A,,,) P(Al n A, n n A,)P(A,+,IA, n A, n. n A,,,) = P(A,)P(A, 1 A,)P(A3 1 A, rl A,) - -. P(A,+, 1 A, n A, n -. and P(A, n A, n -.. n A,) (1.81) Thus, Eq. (1.81) is also true for n = k + 1. By Eq. (1 Al), Eq. (1.81) is true for n = 2. Thus Eq. (1.81) is true for n Two cards are drawn at random from a deck. Find the probability that both are aces. Let A be the event that the first card is an ace, and B be the event that the second card is an ace. The desired probability is P(B n A). Since a card is drawn at random, P(A) = A. Now if the first card is an ace, then there will be 3 aces left in the deck of 51 cards. Thus P(B I A) = A. By Eq. (1.dl),

36 PROBABILITY [CHAP 1 Check: By counting technique, we have There are two identical decks of cards, each possessing a distinct symbol so that the cards from each deck can be identified. One deck of cards is laid out in a fixed order, and the other deck is shufkd and the cards laid out one by one on top of the fixed deck. Whenever two cards with the same symbol occur in the same position, we say that a match has occurred. Let the number of cards in the deck be 10. Find the probability of getting a match at the first four positions. Let A,, i = 1,2,3,4, be the events that a match occurs at the ith position. The required probability is By Eq. (1.81), P(A, n A, n A, n A,) There are 10 cards that can go into position 1, only one of which matches. Thus, P(Al) = &. P(A, (A,) is the conditional probability of a match at position 2 given a match at position 1. Now there are 9 cards left to go into position 2, only one of which matches. Thus, P(A2 I A,) = *. In a similar fashion, we obtain P(A3 I A, n A,) = 4 and P(A, I A, n A, n A,) = 4. Thus, TOTAL PROBABILITY Verify Eq. (1.44). Since B n S = B [and using Eq. (1.43)], we have B= B n S =Bn(A, u A, u u An) =(B n A,) u (B n A,) u... u (B n An) Now the events B n A,, i = 1,2,..., n, are mutually exclusive, as seen from the Venn diagram of Fig Then by axiom 3 of probability and Eq. (1.41), we obtain B n A, BnA, BnA, Fig. 1-14

37 CHAP. 1) PROBABILITY Show that for any events A and B in S, From Eq. (1.64) (Prob. 1.23), we have Using Eq. (1.39), we obtain P(B) = P(B I A)P( A) + P(B I A) P(X) Note that Eq. (1.83) is the special case of Eq. (1.44). P(B) = P(B n A) + P(B n 4 P(B) = P(B I A)P(A) + P(B I X)P(A) Suppose that a laboratory test to detect a certain disease has the following statistics. Let It is known that A = event that the tested person has the disease B = event that the test result is positive P(B I A) = 0.99 and P(B I A) = and 0.1 percent of the population actually has the disease. What is the probability that a person has the disease given that the test result is positive? From the given statistics, we have P(A) = then P(A) = The desired probability is P(A ) B). Thus, using Eqs. (1.42) and (1.83), we obtain Note that in only 16.5 percent of the cases where the tests are positive will the person actually have the disease even though the test is 99 percent effective in detecting the disease when it is, in fact, present. A company producing electric relays has three manufacturing plants producing 50, 30, and 20 percent, respectively, of its product. Suppose that the probabilities that a relay manufactured by these plants is defective are 0.02,0.05, and 0.01, respectively. If a relay is selected at random from the output of the company, what is the probability that it is defective? If a relay selected at random is found to be defective, what is the probability that it was manufactured by plant 2? Let B be the event that the relay is defective, and let Ai be the event that the relay is manufactured by plant i (i = 1,2, 3). The desired probability is P(B). Using Eq. (1.44), we have

38 PROBABILITY [CHAP 1 (b) The desired probability is P(A2 1 B). Using Eq. (1.42) and the result from part (a), we obtain Two numbers are chosen at random from among the numbers 1 to 10 without replacement. Find the probability that the second number chosen is 5. Let A,, i = 1, 2,..., 10 denote the event that the first number chosen is i. Let B be the event that the second number chosen is 5. Then by Eq. (1.44), Now P(A,) = A. P(B I A,) is the probability that the second number chosen is 5, given that the first is i. If i = 5, then P(B I Ai) = 0. If i # 5, then P(B I A,) = 4. Hence, Consider the binary communication channel shown in Fig The channel input symbol X may assume the state 0 or the state 1, and, similarly, the channel output symbol Y may assume either the state 0 or the state 1. Because of the channel noise, an input 0 may convert to an output 1 and vice versa. The channel is characterized by the channel transition probabilities p,, 40, PI, and 91, ckfined by where x, and x, denote the events (X = 0) and (X = I), respectively, and yo and y, denote the events (Y = 0) and (Y = I), respectively. Note that p, + q, = 1 = p, + q,. Let P(xo) = 0.5, po = 0.1, and p, = 0.2. (a) Find P(yo) and P(yl). (b) If a 0 was observed at the output, what is the probability that a 0 was the input state? (c) If a 1 was observed at the output, what is the probability that a 1 was the input state? (d) Calculate the probability of error P,. Fig (a) We note that

39 CHAP. 11 PROBABILITY Using Eq. (1.44), we obtain (b) Using Bayes' rule (1.42), we have (c) Similarly, (d) The probability of error is P, = P(yl (xo)p(xo) + P(yo (xl)p(xl) = O.l(O.5) + 0.2(0.5) = INDEPENDENT EVENTS Let A and B be events in a sample space S. Show that if A and B are independent, then so are (a) A and B, (b) A and B, and (c) A and B. (a) From Eq. (1.64) (Prob. 1.23), we have P(A) = P(A n B) + P(A n B) Since A and B are independent, using Eqs. (1.46) and (1.B), we obtain Thus, by definition (l.46), A and B are independent. (b) Interchanging A and B in Eq. (1.84), we obtain which indicates that A and B are independent. (c) We have Hence, A and B are independent. P(A n B) = P(A) - P(A n B) = P(A) - P(A)P(B) = P(A)[l - P(B)] = P(A)P(B) P(B n 3 = P(B)P(A) P(A n B) = P[(A u B)] [Eq. ( = 1 - P(A u B) [Eq- (1.25)1 = 1 - P(A) - P(B) + P(A n B) [Eq. (1.29)] = 1 - P(A) - P(B) + P(A)P(B) [Eq. (1.46)] = 1 - P(A) - P(B)[l - P(A)] = [l - P(A)][l - P(B)] = P(A)P(B) [Eq. ( Let A and B be events defined in a sample space S. Show that if both P(A) and P(B) are nonzero, then events A and B cannot be both mutually exclusive and independent. Let A and B be mutually exclusive events and P(A) # 01, P(B) # 0. Then P(A n B) = P(%) = 0 but P(A)P(B) # 0. Since A and B cannot be independent Show that if three events A, B, and C are independent, then A and (B u C) are independent.

40 PROBABILITY [CHAP 1 We have P[A n (B u C)] = P[(A n B) u (A n C)] [Eq. (1.12)1 =P(AnB)+P(AnC)-P(AnBnC) [Eq.(1.29)] Thus, A and (B u C) are independent. = P(A)P(B) + P(A)P(C) - P(A)P(B)P(C) CEq. ( 1WI = P(A)P(B) + P(A)P(C) - P(A)P(B n C) [Eq. (1.50)] = P(A)[P(B) + P(C) - P(B n C)] = P(A)P(B u C) CE~. ( Consider the experiment of throwing two fair dice (Prob. 1.31). Let A be the event that the sum of the dice is 7, B be the event that the sum of the dice is 6, and C be the event that the first die is 4. Show that events A and C are independent, but events B and C are not independent. From Fig. 1-3 (Prob. l.5), we see that and Now and Thus, events A and C are independent. But Thus, events B and C are not independent In the experiment of throwing two fair dice, let A be the event that the first die is odd, B be the event that the second die is odd, and C be the event that the sum is odd. Show that events A, B, and C are pairwise independent, but A, B, and C are not independent. From Fig. 1-3 (Prob. 1.5), we see that Thus which indicates that A, B, and C are pairwise independent. However, since the sum of two odd numbers is even, ( A n B n C) = 0 and which shows that A, B, and C are not independent. P(A n B n C) = 0 # $ = P(A)P(B)P(C) A system consisting of n separate components is said to be a series system if it functions when all n components function (Fig. 1-16). Assume that the components fail independently and that the probability of failure of component i is pi, i = 1, 2,..., n. Find the probability that the system functions. Fig Series system.

41 CHAP. I] PROBABILITY Let Ai be the event that component si functions. Then P(Ai) = 1 - P(Ai) = 1 - pi Let A be the event that the system functions. Then, since A,'s are independent, we obtain A system consisting of n separate components is said to be a parallel system if it functions when at least one of the components functions (Fig. 1-17). Assume that the components fail independently and that the probability of failure of component i is pi, i = 1, 2,..., n. Find the probability that the system functions. Fig Parallel system. Let Ai be the event that component si functions. Then Let A be the event that the system functions. Then, since A,'s are independent, we obtain Using Eqs. (1.85) and (1.86), redo Prob From Prob. 1.34, pi = 4, i = 1, 2, 3, 4, where pi is the probability of failure of switch si. Let A be the event that there exists a closed path between a and b. Using Eq. (1.86), the probability of failure for the parallel combination of switches 3 and 4 is P34 = P3 P4 = (+)(a) == a Using Eq. (1.85), the probability of failure for the combination of switches 2, 3, and 4 is Again, using Eq. (1.86), we obtain p234 = 1 - (1-4x1 - i) =; 1-38 = A Bernoulli experiment is a random experiment, the outcome of which can be classified in but one of two mutually exclusive and exhaustive ways, say success or failure. A sequence of Bernoulli trials occurs when a. Bernoulli experiment is performed several independent times so that the probability of success, say p, remains the same from trial to trial. Now an infinite sequence of Bernoulli trials is performed. Find the probability that (a) at least 1 success occurs in the first n trials; (b) exactly k successes occur in the first n trials; (c) all trials result in successes. (a) In order to find the probability of at least 1 success in the first n trials, it is easier to first compute the probability of the complementary event, that of no successes in the first n trials. Let Ai denote the event

42 PROBABILITY [CHAP 1 (b) of a failure on the ith trial. Then the probability of no successes is, by independence, P(A, n A, n -.. n A,) = P(Al)P(A2). -. P(A,) = (1 - p)" (1.87) Hence, the probability that at least 1 success occurs in the first n trials is 1 - (1 - p)". In any particular sequence of the first n outcomes, if k successes occur, where k = 0, 1, 2,..., n, then n - k failures occur. There are (9 such sequences, and each one of these has probability pk(l - P)"-~. Thus, the probability that exactly k successes occur in the first n trials is given by - pyk. (c) Since Ai denotes the event of a success on the ith trial, the probability that all trials resulted in successes in the first n trials is, by independence, P(Al n A, n +. n An) = P(A,)P(A,).. P(A,,) = pn (1.88) Hence, using the continuity theorem of probability (1.74) (Prob. 1.28), the probability that all trials result in successes is given by m n 0 p<l P OXi =P lim r)ai = limp nxi = limpn= (1-1 ) i 1 ) i ) {l p= 1 Let S be the sample space of an experiment and S = {A, B, C), where P(A) = p, P(B) = q, and P(C) = r. The experiment is repeated infinitely, and it is assumed that the successive experiments are independent. Find the probability of the event that A occurs before B. Suppose that A occurs for the first time at the nth trial of the experiment. If A is to have occurred before B, then C must have occurred on the first (n - 1) trials. Let D be the event that A occurs before B. Then n-cc where D, is the event that C occurs on the first (n - 1) trials and A occurs on the nth trial. Since Dm's are mutually exclusive, we have Since the trials are independent, we have Thus, In a gambling game, craps, a pair of dice is rolled and the outcome of the experiment is the sum of the dice. The player wins on the first roll if the sum is 7 or 11 and loses if the sum is 2,3, or 12. If the sum is 4, 5, 6, 8, 9, or 10, that number is called the player's "point." Once the point is established, the rule is: If the player rolls a 7 before the point, the player loses; but if the point is rolled before a 7, the player wins. Compute the probability of winning in the game of craps. Let A, B, and C be the events that the player wins, the player wins on the first roll, and the player gains point, respectively. Then P(A) = P(B) + P(C). Now from Fig. 1-3 (Prob. IS),

43 CHAP. 11 PROBABILITY Let A, be the event that point of k occurs before 7. Then By Eq. (1.89) (Prob. 1.62), P(C) = P(A,)P(point = k) ke(4, 5, 6, 8, 9. 10) Again from Fig. 1-3, Now by Eq. (1.go), Using these values, we obtain Supplementary Problems Consider the experiment of selecting items from a group consisting of three items (a, b, c). (a) Find the sample space S, of the experiment in which two items are selected without replacement. (b) Find the sample space S, of the experiment in which two items are selected with replacement. Ans. (a) S, = {ab, ac, ba, bc, ca, ch) (b) S, = {aa, ah, ac, ha, bh, bc, ca, cb, cc} Let A and B be arbitrary events. Then show that A c B if and on1.y if A u B = B. Hint : Draw a Venn diagram Let A and B be events in the sample space S. Show that if A c B, then B c A. Hint: Draw a Venn diagram Verify Eq. (1.1 3). Hint: Draw a Venn diagram Let A and B be any two events in S. The difference of B and A, denoted by B - A, is defined as B-A=BnA The symmetric difference of A and B, denoted by A A B, is defined by A A B = (A- B) U (B-- A) Show that Hint: Draw a Venn diagram. -- A A B = (A u B) n (A 1-1 B)

44 36 PROBABILITY [CHAP 1 Let A and B be any two events in S. Express the following events in terms of A and B. (a) At least one of the events occurs. (b) Exactly one of two events occurs. Ans. (a) A u B; (b) A A B Let A, B, and C be any three events in S. Express the following events in terms of these events. (a) Either B or C occurs, but not A. (b) Exactly one of the events occurs. (c) Exactly two of the events occur. Ans. (a) A n (B v C) (b) (A n (B u C)) u (B n (A u C)) u (C n (A u B)) (c) ((A n B) n C) u {(A n C) n B) u {(B n C) n A) A random experiment has sample space S = {a, h, c). Suppose that P({a, c)) = 0.75 and P({b, c)) = 0.6. Find the probabilities of the elementary events. Ans. P(a) = 0.4, P(b) = 0.25, P(c) = 0.35 Show that (a) P(A u B) = 1 - P(A n B) (b) P(A n B) P(A) - P(B) Hint: (a) Use Eqs. (1.1 5) and (1.Z). (b) Use Eqs. (1.29), (l.25), and (1.28). (c) See Prob and use axiom 3. Let A, B, and C be three events in S. If P(A) = P(B) = 4, P(C) = 4, P(A n B) = 4, P(A n C) = 6, and P(B n C) = 0, find P(A u B u C). Ans. 2 Verify Eq. (1.30). Hint: Prove by induction. Show that P(A, n A, n - - n A,) 2 P(A,) + P(AJ P(A,) - (n - 1) Hint: Use induction to generalize Bonferroni's inequality (1.63) (Prob. 1.22). In an experiment consisting of 10 throws of a pair of fair dice, find the probability of the event that at least one double 6 occurs. Ans Show that if P(A) > P(B), then P(A I B) > P(B I A). Hint : Use Eqs. (1.39) and (1.do). An urn contains 8 white balls and 4 red balls. The experiment consists of drawing 2 balls from the urn without replacement. Find the probability that both balls drawn are white. Ans

45 CHAP. 1) PROBABILITY There are 100 patients in a hospital with a certain disease. Of these, 10 are selected to undergo a drug treatment that increases the percentage cured rate from 50 percent to 75 percent. What is the probability that the patient received a drug treatment if the patient is known to be cured? Ans Two boys and two girls enter a music hall and take four seats at random in a row. What is the probability that the girls take the two end seats? Ans. Let A and B be two independent events in S. It is known that I'(A n B) = 0.16 and P(A u B) = Find P(A) and P(B). Ans. P(A) = P(B) = 0.4 The relay network shown in Fig operates if and only if there is a closed path of relays from left to right. Assume that relays fail independently and that the probability of failure of each relay is as shown. What is the probability that the relay network operates? Ans I 0.3 I Fig. 1-18

46 Chapter INTRODUCTION In this chapter, the concept of a random variable is introduced. The main purpose of using a random variable is so that we can define certain probability functions that make it both convenient and easy to compute the probabilities of various events. 2.2 RANDOM VARIABLES A. Definitions: Consider a random experiment with sample space S. A random variable X(c) is a single-valued real function that assigns a real number called the value of X([) to each sample point [ of S. Often, we use a single letter X for this function in place of X(5) and use r.v. to denote the random variable. Note that the terminology used here is traditional. Clearly a random variable is not a variable at all in the usual sense, and it is a function. The sample space S is termed the domain of the r.v. X, and the collection of all numbers [values of X([)] is termed the range of the r.v. X. Thus the range of X is a certain subset of the set of all real numbers (Fig. 2-1). Note that two or more different sample points might give the same value of X(0, but two different numbers in the range cannot be assigned to the same sample point. Fig. 2-1 x (0 Random variable X as a function. R EXAMPLE 2.1 In the experiment of tossing a coin once (Example 1.1), we might define the r.v. X as (Fig. 2-2) X(H) = 1 X(T) = 0 Note that we could also define another r.v., say Y or 2, with B. Events Defined by Random Variables: Y(H) = 0, Y(T) = 1 or Z(H) = 0, Z(T) = 0 If X is a r.v. and x is a fixed real number, we can define the event (X = x) as (X = x) = {l: X(C) = x) Similarly, for fixed numbers x, x,, and x,, we can define the following events: (X 5 x) = {l: X(l) I x) (X > x) = {C: X([) > x) (xl < X I x2) = {C: XI < X(C) l x2)

47 CHAP. 21 RANDOM VARIABLES Fig. 2-2 One random variable associated with coin tossing. These events have probabilities that are denoted by EXAMPLE 2.2 P(X = x) = P{C: X(6) = X} P(X 5 x) = P(6: X(6) 5 x} P(X > x) = P{C: X(6) > x) P(x, < X I x,) = P{(: x, < X(C) I x,) In the experiment of tossing a fair coin three times (Prob. 1.1), the sample space S, consists of eight equally likely sample points S, = (HHH,..., TTT). If X is the r.v. giving the number of heads obtained, find (a) P(X = 2); (b) P(X < 2). (a) Let A c S, be the event defined by X = 2. Then, from Prob. 1.1, we have Since the sample points are equally likely, we have (b) Let B c S, be the event defined by X < 2. Then A = (X = 2) = {C: X(C) = 2) = {HHT, HTH, THH) P(X = 2) = P(A) = 3 B = (X < 2) = {c: X(() < 2) = (HTT, THT, TTH, TTT) and P(X < 2) = P(B) = 3 = DISTRIBUTION FUNCTIONS A. Definition : The distribution function [or cumulative distributionfunction (cdf)] of X is the function defined by Most of the information about a random experiment described by the r.v. X is determined by the behavior of FAX). B. Properties of FAX) : Several properties of FX(x) follow directly from its definition (2.4). 2. Fx(xl) I Fx(x,) if x, < x2 3. lim F,(x) = Fx(oo) = 1 x-'m 4. lim FAX) = Fx(- oo) = 0 x-r-m 5. lim FAX) = Fda+) = Fx(a) a+ = lim a + E x+a+ O<&+O

48 40 RANDOM VARIABLES [CHAP 2 Property 1 follows because FX(x) is a probability. Property 2 shows that FX(x) is a nondecreasing function (Prob. 2.5). Properties 3 and 4 follow from Eqs. (1.22) and (1.26): limp(x<x)= P(X < co) = P(S)= 1 X+oO lim P(X s x) = P(X s - co) = P(0) = 0 x-'-a, Property 5 indicates that FX(x) is continuous on the right. This is the consequence of the definition (2.4). Table 2.1 % (TTT) (TTT, TTH, THT, HTT) (TTT, TTH, THT, HTT, HHT, HTH, THH) S S EXAMPLE 2.3 Consider the r.v. X defined in Example 2.2. Find and sketch the cdf FX(x) of X. Table 2.1 gives Fx(x) = P(X I x) for x = - 1, 0, 1, 2, 3, 4. Since the value of X must be an integer, the value of F,(x) for noninteger values of x must be the same as the value of FX(x) for the nearest smaller integer value of x. The FX(x) is sketched in Fig Note that F,(x) has jumps at x = 0, 1,2,3, and that at each jump the upper value is the correct value for FX(x). -I 0 I Fig. 2-3 C. Determination of Probabilities from the Distribution Function: From definition (2.4), we can compute other probabilities, such as P(a < X I b), P(X > a), and P(X < b) (Prob. 2.6): P(X < b) = F,(b-) b-= lim b-e O<E-'O

49 CHAP. 21 RANDOM VARIABLES 2.4 DISCRETE RANDOM VARIABLES AND PROBABILITY MASS FUNCTIONS A. Definition : Let X be a r.v. with cdf FX(x). If FX(x) changes values only in jumps (at most a countable number of them) and is constant between jumps-that is, FX(x) is a staircase function (see Fig. 2-3)-- then X is called a discrete random variable. Alternatively, X is a discrete r.v. only if its range contains a finite or countably infinite number of points. The r.v. X in Example 2.3 is an example of a discrete r.v. B. Probability Mass Functions: Suppose that the jumps in FX(x) of a discrete r.v. X occur at the points x,, x,,..., where the sequence may be either finite or countably infinite, and we assume xi < xj if i < j. Then FX(xi) - FX(xi-,) = P(X 5 xi) - P(X I xi-,) = P(X = xi) (2.1 3) Let px(x) = P(X = x) (2.1 4) The function px(x) is called the probability mass function (pmf) of the discrete r.v. X. Properties of pdx) : The cdf FX(x) of a discrete r.v. X can be obtained by 2.5 CONTINUOUS RANDOM VARIABLES AND PROBABILITY DENSITY FUNCTIONS A. Definition: Let X be a r.v. with cdf FX(x). If FX(x) is continuous and. also has a derivative dfx(x)/dx which exists everywhere except at possibly a finite number of points and is piecewise continuous, then X is called a continuous random variable. Alternatively, X is a continuous r.v. only if its range contains an interval (either finite or infinite) of real numbers. Thus, if X is a. continuous r.v., then (Prob. 2.18) Note that this is an example of an event with probability 0 that is not necessarily the impossible event 0. In most applications, the r.v. is either discrete or continuous. But if the cdf FX(x) of a r.v. X possesses features of both discrete and continuous r.v.'s, then the r.v. X is called the mixed r.v. (Prob. 2.10). B. Probability Density Functions: Let The function fx(x) is called the probability density function (pdf) of the continuous r.v. X.

50 RANDOM VARIABLES [CHAP 2 Properties of fx(x) : 3. fx(x) is piecewise continuous. The cdf FX(x) of a continuous r.v. X can be obtained by By Eq. (2.19), if X is a continuous r.v., then 2.6 MEAN AND VARIANCE A. Mean: The mean (or expected ualue) of a rev. X, denoted by px or E(X), is defined by px = E(X) = xfx(x) dx X: discrete X: continuous B. Moment: irx(xk) The nth moment of a r.v. X is defined by E(.n) = X: discrete xnfdx) dx X: continuous Note that the mean of X is the first moment of X. C. Variance: Thus, The variance of a r.v. X, denoted by ax2 or Var(X), is defined by ox2 = Var(X) = E{[X - E(X)I2} ex2 = 1 rc (xk - px)2px(~j X : discrete im (X- px)2/x(x) dx x: continuous

51 CHAP. 21 RANDOM VARIABLES Note from definition (2.28) that The standard deviation of a r.v. X, denoted by a, is the positive square root of Var(X). Expanding the right-hand side of Eq. (2.28), we can obtain the following relation: which is a useful formula for determining the variance. 2.7 SOME SPECIAL DISTRIBUTIONS In this section we present some important special distributions. A. Bernoulli Distribution: A r.v. X is called a Bernoulli r.v. with parameter p if its pmf is given by px(k) = P(X = k) = pk(l - P )'-~ k = 0, 1 where 0 p I 1. By Eq. (2.18), the cdf FX(x) of the Bernoulli r.v. X is given by x<o Figure 2-4 illustrates a Bernoulli distribution. Fig. 2-4 Bernoulli distribution. The mean and variance of the Bernoulli r.v. X are A Bernoulli r.v. X is associated with some experiment where an outcome can be classified as either a "success" or a "failure," and the probability of a success is p and the probability of a failure is 1 - p. Such experiments are often called Bernoulli trials (Prob. 1.61).

52 RANDOM VARIABLES [CHAP 2 B. Binomial Distribution: A r.v. X is called a binomial r.v. with parameters (n, p) if its pmf is given by where 0 5 p 5 1 and n! (;) = k!(,, - k)! which is known as the binomial coefficient. The corresponding cdf of X is Figure 2-5 illustrates the binomial distribution for n = 6 and p = 0.6. (a (h) Fig. 2-5 Binomial distribution with n = 6, p = 0.6. The mean and variance of the binomial r.v. X are (Prob. 2.28) A binomial r.v. X is associated with some experiments in which n independent Bernoulli trials are performed and X represents the number of successes that occur in the n trials. Note that a Bernoulli r.v. is just a binomial r.v. with parameters (1, p). C. Poisson Distribution: A r.v. X is called a Poisson r.v. with parameter A (> 0) if its pmf is given by The corresponding cdf of X is Figure 2-6 illustrates the Poisson distribution for A = 3.

53 CHAP. 21 RANDOM VARIABLES Fig. 2-6 Poisson distribution with A = 3. The mean and variance of the Poisson r.v. X are (Prob. 2.29) px = E(X) = A. ax2 = Var(X) = il The Poisson r.v. has a tremendous range of applications in diverse areas because it may be used as an approximation for a binomial r.v. with parameters (n, p) when n is large and p is small enough so that np is of a moderate size (Prob. 2.40). Some examples of Poisson r.v.'s include 1. The number of telephone calls arriving at a switching center during various intervals of time 2. The number of misprints on a page of a book 3. The number of customers entering a bank during various intervals of time D. Uniform Distribution: A r.v. X is called a uniform r.v. over (a, b) if its pdf is given by The corresponding cdf of X is (0 otherwise x-a FX(x) = - {h-a a<x<b Figure 2-7 illustrates a uniform distribution. The mean and variance of the uniform r.v. X are (Prob. 2.31)

54 RANDOM VARIABLES [CHAP 2 Fig. 2-7 Uniform distribution over (a, b). A uniform r.v. X is often used where we have no prior knowledge of the actual pdf and all continuous values in some range seem equally likely (Prob. 2.69). E. Exponential Distribution: A r.v. X is called an exponential r.v. with parameter A (>O) if its pdf is given by which is sketched in Fig. 2-8(a). The corresponding cdf of X is which is sketched in Fig. 2-8(b). Fig. 2-8 Exponential distribution. The mean and variance of the exponential r.v. X are (Prob. 2.32) The most interesting property of the exponential distribution is its "memoryless" property. By this we mean that if the lifetime of an item is exponentially distributed, then an item which has been in use for some hours is as good as a new item with regard to the amount of time remaining until the item fails. The exponential distribution is the only distribution which possesses this property (Prob. 2.53).

55 CHAP. 21 RANDOM VARIABLES 47 F. Normal (or Gaussian) Distribution: A r.v. X is called a normal (or gaussian) r.v. if its pdf is given by The corresponding cdf of X is This integral cannot be evaluated in a closed form and must be evaluated numerically. It is convenient to use the defined as to help us to evaluate the value of FX(x). Then Eq. (2.53) can be written as Note that The is tabulated in Table A (Appendix A). Figure 2-9 illustrates a normal distribution. Fig. 2-9 Normal distribution. The mean and variance of the normal r.v. X are (Prob. 2.33) We shall use the notation N(p; a2) to denote that X is normal with mean p and variance a2. A normal r.v. Z with zero mean and unit variance-that is, Z = N(0; 1)-is called a standard normal r.v. Note that the cdf of the standard normal r.v. is given by Eq. (2.54). The normal r.v. is probably the most important type of continuous r.v. It has played a significant role in the study of random phenomena in nature. Many naturally occurring random phenomena are approximately normal. Another reason for the importance of the normal r.v. is a remarkable theorem called the central limit theorem. This theorem states that the sum of a large number of independent r.v.'s, under certain conditions, can be approximated by a normal r.v. (see Sec. 4.8C).

56 48 RANDOM VARIABLES [CHAP CONDITIONAL DISTRIBUTIONS In Sec. 1.6 the conditional probability of an event A given event B is defined as The conditional cdf FX(x ( B) of a r.v. X given event B is defined by The conditional cdf F,(x 1 B) has the same properties as FX(x). (See Prob and Sec. 2.3.) In particular, F,(-coIB)=O FX(m 1 B) = 1 (2.60) P(a < X I b I B) = Fx(b I B) - Fx(a I B) (2.61) If X is a discrete r.v., then the conditional pmf p,(xk I B) is defined by If X is a continuous r.v., then the conditional pdf fx(x 1 B) is defined by Solved Problems RANDOM VARIABLES 2.1. Consider the experiment of throwing a fair die. Let X be the r.v. which assigns 1 if the number that appears is even and 0 if the number that appears is odd. (a) What is the range of X? (b) Find P(X = 1 ) and P(X = 0). The sample space S on which X is defined consists of 6 points which are equally likely: S = (1, 2, 3, 4, 5, 6) (a) The range of X is R, = (0, 1). (b) (X = 1) = (2, 4, 6). Thus, P(X = 1) = 2 = +. Similarly, (X = 0) = (1, 3,5), and P(X = 0) = Consider the experiment of tossing a coin three times (Prob. 1.1). Let X be the r.v. giving the number of heads obtained. We assume that the tosses are independent and the probability of a head is p. (a) What is the range of X? (b) Find the probabilities P(X = 0), P(X = I ), P(X = 2), and P(X = 3). The sample space S on which X is defined consists of eight sample points (Prob. 1.1): (a) The range of X is R, = (0, 1, 2, 3). S= {HHH, HHT,..., TTT)

57 CHAP. 21 RANDOM VARIABLES (b) If P(H) = p, then P(T) = 1 - p. Since the tosses are independent, we have 2.3. An information source generates symbols at random from. a four-letter alphabet (a, b, c, d} with probabilities P(a) = f, P(b) = $, and P(c) = P(d) = i. A coding scheme encodes these symbols into binary codes as follows: Let X be the r.v. denoting the length of the code, that is, the number of binary symbols (bits). (a) What is the range of X? (b) Assuming that the generations of symbols are independent, find the probabilities P(X = I), P(X = 2), P(X = 3), and P(X > 3). (a) TherangeofXisR, = {1,2, 3). (b) P(X = 1) = P[{a)] = P(a) = P(X = 2) = P[(b)] = P(b) = $ P(X = 3) = P[(c, d)] = P(c) + P(d) = $ P(X > 3) = P(%) = Consider the experiment of throwing a dart onto a circular plate with unit radius. Let X be the r.v. representing the distance of the point where the dart lands from the origin of the plate. Assume that the dart always lands on the plate and that the dart is equally likely to land anywhere on the plate. (a) What is the range of X? (b) Find (i) P(X < a) and (ii) P(a < X < b), where a < b I 1. (a) The range of X is R, = (x: 0 I x < 1). (b) (i) (X < a) denotes that the point is inside the circle of radius a. Since the dart is equally likely to fall anywhere on the plate, we have (Fig. 2-10) (ii) (a < X < b) denotes the event that the point is inside the annular ring with inner radius a and outer radius b. Thus, from Fig. 2-10, we have DISTRIBUTION FUNCTION 2.5. Verify Eq. (2.6). Let x, < x,. Then (X 5 x,) is a subset of (X I x,); that is, (X I x,) c (X I x,). Then, by Eq. (1.27), we have

58 RANDOM VARIABLES [CHAP 2 Fig Verify (a) Eq. (2.1 0); (b) Eq. (2.1 1); (c) Eq. (2.1 2). (a) Since (X _< b) = (X I a) u (a < X _< b) and (X I a) n (a < X 5 h) we have P(X I h) = P(X 5 a) + P(u < X I b) or F,y(b) = FX(a) + P(u < X I h) Thus, P(u < X 5 b) = Fx(h) - FX(u) (b) Since (X 5 a) u (X > a) = S and (X I a) n (X > a) = a, we have P(X S a) + P(X > a) = P(S) = 1 Thus, (c) Now P(X > a) = 1 - P(X 5 a) = 1 - Fx(u) P(X < h) = P[lim X 5 h - E] = lim P(X I b - E) c-0 c+o c>o E>O = lim Fx(h - E) = Fx(b - ) 8-0 8: > Show that (a) P(a i X i b) = P(X = a) + Fx(b) - Fx(a) (b) P(a < X < b) = Fx(b) - F,(a) - P(X = h) (c) P(a i X < b) = P(X = u) + Fx(b) - Fx(a) - P(X = b) (a) Using Eqs. (1.23) and (2.10), we have P(a I X I h) = P[(X = u) u (a < X I b)] = P(X = u) + P(a < X 5 b) = P(X = a) + F,y(h) - FX(a) (b) We have P(a < X 5 b) = P[(u < X c h) u (X = b)] = P(u < X < h) + P(X = b)

59 CHAP. 21 RANDOM VARIABLES Again using Eq. (2.10), we obtain P(a < X < b) = P(a < X I b) - P(X = b) = Fx(b) - Fx(a) - P(X = b) Similarly, P(a I X I b) = P[(a I X < b) u (X = b)] = P(a I X < b) + P(X = b) Using Eq. (2.64), we obtain P(a I X < b) = P(a 5 X 5 b) - P(X = b) = P(X = a) + Fx(b) - F,(a) - P(X = b) X be the r.v. defined in Prob Sketch the cdf FX(x) of X and specify the type of X. Find (i) P(X I I), (ii) P(l < X I 2), (iii) P(X > I), and (iv) P(l I X I 2). From the result of Prob. 2.3 and Eq. (2.18), we have which is sketched in Fig The r.v. X is a discrete r.v. (i) We see that P(X 5 1) = Fx(l) = 4 (ii) By Eq. (2.1 O), P(l < X 5 2) = Fx(2) - FA1) = - 4 = (iii) By Eq. (2.1 I), P(X > 1) = 1 - Fx(l) = 1 - $ = $ (iv) By Eq. (2.64), P(l I X I 2) = P(X = 1) + Fx(2) - Fx(l) = = 3 Fig Sketch the cdf F,(x) of the r.v. X defined in Prob. 2.4 and specify the type of X. From the result of Prob. 2.4, we have FX(x)=P(XIx)= 0 x<o 1 l l x which is sketched in Fig The r.v. X is a continuous r.v.

60 RANDOM VARIABLES [CHAP 2 Fig Consider the function given by (a) Sketch F(x) and show that F(x) has the properties of a cdf discussed in Sec. 2.3B. (6) If X is the r.v. whose cdf is given by F(x), find (i) P(X I i), (ii) P(0 < X i), (iii) P(X = O), and (iv) P(0 < X < i). (c) Specify the type of X. (a) The function F(x) is sketched in Fig From Fig. 2-13, we see that 0 < F(x) < 1 and F(x) is a nondecreasing function, F(- co) = 0, F(co) = 1, F(0) = 4, and F(x) is continuous on the right. Thus, F(x) satisfies all the properties [Eqs. (2.5) to (2.91 required of a cdf. (6) (i) We have (ii) By Eq. (2.1 O), (iii) By Eq. (2.12), (iv) By Eq. (2.64), (c) The r.v. X is a mixed r.v. Fig. 2-13

61 CHAP. 21 RANDOM VARIABLES Find the values of constants a and b such that is a valid cdf. To satisfy property 1 of FX(x) [0 I FX(x) 5 11, we must ha.ve 0 5 a 5 1 and b > 0. Since b > 0, property 3 of FXfx) [Fx(~) = 1) is satisfied. It is seen that property 4 of FX(x) [F,(-m) = O] is also satisfied. For 0 5 a I 1 and b > 0, F(x) is sketched in Fig From Fig. 2-14, we see that F(x) is a nondecreasing function and continuous on the right, and properties 2 and 5 of t7,(x) are satisfied. Hence, we conclude that F(x) given is a valid cdf if 0 5 a 5 1 and b > 0. Note that if a = 0, then the r.v. X is a discrete r.v.; if a = 1, then X is a continuous r.v.; and if 0 < a < 1, then X is a mixed r.v. 0 Fig DISCRETE RANDOM VARIABLES AND PMF'S Suppose a discrete r.v. X has the following pmfs: PXW = 4 P X = ~ $ px(3) = i (a) Find and sketch the cdf F,(x) of the r.v. X. (b) Find (i) P(X _< I), (ii) P(l < X _< 3), (iii) P(l I X I 3). (a) By Eq. (2.1 8), we obtain (b) which is sketched in Fig (i) By Eq. (2.12), we see that P(X < I )= Fx(l-)=0 (ii) By Eq. (2.10), (iii) By Eq. (2.64), P(l < X I 3) = Fx(3) - Fx(l) = - 4 = 2 P(l I X I 3) = P(X = 1) + Fx(3) - Fx(l) = = 3

62 RANDOM VARIABLES [CHAP 2 Fig (a) Verify that the function p(x) defined by is a pmf of a discrete r.v. X. (b) Find (i) P(X = 2), (ii) P(X I 2), (iii) P(X 2 1). (a) It is clear that 0 5 p(x) < 1 and x =o, 1, 2,... otherwise Thus, p(x) satisfies all properties of the pmf [Eqs. (2.15) to (2.17)] of a discrete r.v. X. (b) (i) By definition (2.14), (ii) By Eq. (2.1 8), P(X = 2) = p(2) = $($)2 = (iii) By Eq. (l.25), Consider the experiment of tossing an honest coin repeatedly (Prob. 1.35). Let the r.v. X denote the number of tosses required until the first head appears. (a) Find and sketch the pmf p,(x) and the cdf F,(x) of X. (b) Find (i) P(l < X s 4), (ii) P(X > 4). (a) From the result of Prob. 1.35, the pmf of X is given by Then by Eq. (2.1 8),

63 CHAP. 21 RANDOM VARIABLES These functions are sketched in Fig (b) (9 BY Eq. (2.m (ii) By Eq. (1.Z), P(l < X 1 4) = Fx(4) - Fx(X) = - 3 = P(X > 4) = 1 - P(X 5 4) = 1 - Fx(4) = 1 - = -& Fig Consider a sequence of Bernoulli trials with probability p of success. This sequence is observed until the first success occurs. Let the r.v. X denote the trial number on which this first success occurs. Then the pmf of X is given by because there must be x - 1 failures before the first success occurs on trial x. The r.v. X defined by Eq. (2.67) is called a geometric r.v. with parameter p. (a) Show that px(x) given by Eq. (2.67) satisfies Eq. (2.1 7). (b) Find the cdf F,(x) of X. (a) Recall that for a geometric series, the sum is given by Thus, (b) Using Eq. (2.68), we obtain Thus, P(X 5 k) = 1 - P(X > k) = 1 - (1 - and Fx(x)=P(X<~)=1-(1-p)" x=1,2,... Note that the r.v. X of Prob is the geometric r.v. with p == Let X be a binomial r.v. with parameters (n, p). (a) Show that p&) given by Eq. (2.36) satisfies Eq. (2.1 7).

64 RANDOM VARIABLES [CHAP 2 (b) FindP(X> l)ifn= 6andp=0.1. (a) Recall that the binomial expansion formula is given by Thus, by Eq. (2.36), (b) NOW P(X > 1 ) = 1 - P(X = 0) - P(X = 1) Let X be a Poisson r.v. with parameter A. (a) Show that p,(x) given by Eq. (2.40) satisfies Eq. (2.1 7). (b) Find P(X > 2) with 1 = 4. (h) With A = 4, we have and Thus, CONTINUOUS RANDOM VARIABLES AND PDF'S Verify Eq. (2.1 9). From Eqs. (1.27) and (2.10), we have for any E 2 0. As Fx(x) is continuous, the right-hand side of the above expression approaches 0 as E + 0. Thus, P(X = x) = The pdf.of a continuous r.v. X is given by 3 O<x<l 0 otherwise Find the corresponding cdf FX(x) and sketch fx(x) and F,(x).

65 CHAP. 21 RANDOM VARIABLES 5 7 By Eq. (2.24), the cdf of X is given by i X 0 1 x c 1 3 Fix)= [idi+l$d/=%x-i 1 5 ~ 2 1 x The functions fdx) and FAX) are sketched in Fig ~ 2 Fig Let X be a continuous r.v. X with pdf kx O<x<l fx(x) = {O otherwise where k is a constant. (a) Determine the value of k and sketch f,(x). (b) Find and sketch the corresponding cdf Fx(x). (c) Find P($ < X 12). (a) By Eq. (2.21), we must have k > 0, and by Eq. (2.22), Thus, k = 2 and which is sketched in Fig. 2-18(a). (b) By Eq. (2.24), the cdf of X is given by 2x O<x<l 0 otherwise which is sketched in Fig. 2-18(b). [[2(d(=l l i x

66 5 8 RANDOM VARIABLES [CHAP 2 Fig Show that the pdf of a normal r.v. X given by Eq. (2.52) satisfies Eq. (2.22). From Eq. (2.52), Let Then Letting x = r cos 9 and y = r sin 9 (that is, using polar coordinates), we have Thus, and Consider a function Find the value of a such that f (x) is a pdf of a continuous r.v. X.

67 CHAP. 21 RANDOM VARIABLES Iff (x) is a pdf of a continuous r.v. X, then by Eq. (2.22), we must have 1 Now by Eq. (2.52), the pdf of N(i; 4) is - e-(x-'/2)2. Thus, J;; from which we obtain a = a A r.v. X is called a Rayleigh r.v. if its pdf is given by # (a) Determine the corresponding cdf FX(x). (b) Sketch.fx(x) and FX(x) for a = 1. (a) By Eq. (2.24), the cdf of X is Let y = t2/(2a2). Then dy = (l/a2)t dt, and (b) With a = 1, we have and These functions are sketched in Fig A r.v. X is called a gamma r.v. with parameter (a, A) (a > 0 and 1 > 0) if its pdf is given by where T(a) is the gamma function defined by

68 RANDOM VARIABLES [CHAP 2 (a) Fig Rayleigh distribution with o = 1. (b) (a) Show that the gamma function has the following properties: (b) Show that the pdf given by Eq. (2.76) satisfies Eq. (2.22). (c) Plotfx(x) for (a, 1) = (1, I), (2, I), and (5,2). (a) Integrating Eq. (2.77) by parts (u = xa-', dv = e-" dx), we obtain Replacing a by a + 1 in Eq. (2.81), we get Eq. (2.78). Next, by applying Eq. (2.78) repeatedly using an integral value of a, say a = k, we obtain Since it follows that T(k + 1) = k!. Finally, by Eq. (2.77), Let y = x1i2. Then dy = ax- 'I2 dx, and in view of Eq. (2.73). (b) Now Let y = Ax. Then dy = A dx and

69 CHAP. 21 RANDOM VARIABLES fdx) X Fig Gamma distributions for selected values of a and I. (c) The pdf's fx(x) with (a, I) = (1, I), (2, I), and (5, 2) are plotted in Fig Note that when a = 1, the gamma r.v. becomes an exponential r.v. with parameter 1 [Eq. (2.48)]. MEAN AND VARIANCE Consider a discrete r.v. X whose pmf is given by (a) Plot p,(x) and find the mean and variance of X. (b) Repeat (a) if the pmf is given by (a) The pmf p,(x) is plotted in Fig. 2-21(a). By Eq. (2.26), the mean of X is By Eq. (2.29), the variance of X is p, = E(X) = #( ) = 0 ax2 = Var(X) = E[(X - px)2] = E(X2) = $[( (0)2 + = 3 (b) The pmf px(x) is plotted in Fig. 2-21(b). Again by Eqs. (2.26) and (2.29), we obtain px=e(x)=f(-2+0+2)=o Fig. 2-21

70 RANDOM VARIABLES [CHAP 2 ox2 = Var(X) = +[(- 2)2 + (0)2 + (2)'] = Note that the variance of X is a measure of the spread of a distribution about its mean Let a r.v. X denote the outcome of throwing a fair die. Find the mean and variance of X Since the die is fair, the pmf of X is px(x) = px(k) = i k = 1, 2,..., 6 By Eqs. (2.26) and (2.29), the mean and variance of X are PX= E(X)= +(I ) = g = 3.5 ax2 = 4[(1 - T) (2- $)2 + (3- $)' + (4-4)' + (5-4)' + (6- $)2] = Alternatively, the variance of X can be found as follows: E(X2) = i(12 + 2' ' )= 9 Hence, by Eq. (2.31), ox2 = E(X2) - [E(X)12 = - (5)' = Find the mean and variance of the geometric r.v. X defined by Eq. (2.67) (Prob. 2.15). To find the mean and variance of a geometric r.v. X, we need the following results about the sum of a geometric series and its first and second derivatives. Let Then By Eqs. (2.26) and (2.67), and letting q = 1 - p, the mean of X is given by where Eq. (2.83) is used with a = p and r = q. To find the variance of X, we first find E[X(X - I)]. Now, where Eq. (2.84) is used with a = pq and r = q. Since E[X(X - I)] = E(X2 - X) = E(X2) - E(X), we have Then by Eq. (2.31), the variance of X is Let X be a binomial r.v. with parameters (n, p). Verify Eqs. (2.38) and (2.39).

71 CHAP. 21 RANDOM VARIABLES By Eqs. (2.26) and (2.36), and letting q = 1 - p, we have = C k k=o n n! (n- k)! k! pkqn - Letting i = k - 1 and using Eq. (2.72), we obtain Next, n - n! k n-k - l k(k- - k)!k! k = 0 Similarly, letting i = k - 2 and using Eq. (2.72), we obtain n-2 (n- 2)! E[X(X - I)] = n(n - l)p2 C i=o (n- 2 - i)! i! ~ ~ q ~ - ~ - ~ Thus, and by Eq. (2.31), = n(n - l)p2(p + q)"-2 = n(n - l)p2 E(X2) = E[X(X - l)] + E(X) = n(n - l)p2 + np ax2 = Var(x) = n(n - l)p2 + np - ( n~)~ = np(1 - p) Let X be a Poisson r.v. with parameter 1. Verify Eqs. (2.42) and (2.43). By Eqs. (2.26) and (2.40), Next, Thus,

72 RANDOM VARIABLES [CHAP 2 and by Eq. (2.31), Find the mean and variance of the r.v. X of Prob From Prob. 2.20, the pdf of X is 2x O<x<l fx(x) = (0 otherwise By Eq. (2.26), the mean of X is By Eq. (2.27), we have Thus, by Eq. (2.31), the variance of X is Let X be a uniform r.v. over (a, b). Verify Eqs. (2.46) and (2.47). By Eqs. (2.44) and (2.26), the mean of X is By Eq. (2.27), we have Thus, by Eq. (2.31), the variance of X is Let X be an exponential r.v. X with parameter A. Verify Eqs. (2.50) and (2.51). By Eqs. (2.48) and (2.26), the mean of X is Integrating by parts (u = x, du = Re-" dx) yields Next, by Eq. (2.27), Again integrating by parts (u = x2, du = le-" dx), we obtain

73 CHAP. 21 RANDOM VARIABLES Thus, by Eq. (2.31), the variance of X is Let X = N(p; a2). Verify Eqs. (2.57) and (2.58). Using Eqs. (2.52) and (2.26), we have Writing x as (x - p) + p, we have Letting y =.x - p in the first integral, we obtain The first integral is zero, since its integrand is an odd function. Thus, by the property of pdf Eq. (2.22), we get Next, by Eq. (2.29), From Eqs. (2.22) and (2.52), we have [~e-(x-~)2/~2az) dx = Differentiating with respect to a, we obtain Multiplying both sides by a2/&, we have Thus, ax2 = Var(X) = a Find the mean and variance of a Rayleigh r.v. defined by Eq. (2.74) (Prob. 2.23). Using Eqs. (2.74) and (2.26), we have Now the variance of N(0; a2) is given by

74 RANDOM VARIABLES [CHAP 2 Since the integrand is an even function, we have Then Next, Let y = x2/(2a2). Then dy = x dx/a2, and so Hence, by Eq. (2.31), Consider a continuous r.v. X with pdf f,(x). If fx(x) = 0 for x < 0, then show that, for any a > 0, Clx P(X 2 a) 5 - a where px = E(X). This is known as the Markov inequality. From Eq. (2.23), Since fx(x) = 0 for x < 0, For any a > 0, show that where px and ax2 are the mean and variance of X, respectively. This is known as the Chebyshev inequality. From Eq. (2.23), By Eq. (2.29),

75 CHAP. 21 RANDOM VARIABLES Note that by setting a = ka, in Eq. (2.97), we obtain Equation (2.98) says that the probability that a r.v. will fall k or more standard deviations from its mean is < l/k2. Notice that nothing at all is said about the distribution function of X. The Chebyshev inequality is therefore quite a generalized statement. However, when applied to a particular case, it may be quite weak. SPECIAL DISTRIBUTIONS A binary source generates digits 1 and 0 randomly with probabilities 0.6 and 0.4, respectively. (a) What is the probability that two 1s and three 0s will occur in a five-digit sequence? (b) What is the probability that at least three 1s will occur in a five-digit sequence? (a) Let X be the r.v. denoting the number of 1s generated in a five-digit sequence. Since there are only two possible outcomes (1 or O), the probability of generating 1 is constant, and there are five digits, it is clear that X is a binomial r.v. with parameters (n, p) = (5, 0.6). Hence, by Eq. (2.36), the probability that two 1s and three 0s will occur in a five-digit sequence is (b) The probability that at least three 1s will occur in a five-digit sequence is where Hence, P(X 2 3) = = , A fair coin is flipped 10 times. Find the probability of the occurrence of 5 or 6 heads. Let the r.v. X denote the number of heads occurring when ia fair coin is flipped 10 times. Then X is a binomial r.v. with parameters (n, p) = (10, 4).Thus, by Eq. (2.36), Let X be a binomial r.v. with parameters (n, p), where 0 <: p < 1. Show that as k goes from 0 to n, the pmf p,(k) of X first increases monotonically and then decreases monotonically, reaching its largest value when k is the largest integer less than or equal to (n + 1)p. By Eq. (2.36), we have Hence, px(k) 2 px(k - 1) if and only if (n - k + l)p 2 k(l - p) or k I (n + 1)p. Thus, we see that px(k) increases monotonically and reaches its maximum when k is the largest integer less than or equal to (n + 1)p and then decreases monotonically.

76 68 RANDOM VARIABLES [CHAP Show that the Poisson distribution can be used as a convenient approximation to the binomial distribution for large n and small p. From Eq. (2.36), the pmf of the binomial r.v. with parameters (n, p) is n(n - 1Xn - 2).. (n- k + 1) pk(l - p)"-k p = ( ) l - p). - = k! Multiplying and dividing the right-hand side by nk, we have ( I - i)(l - ).. - 7) - p). - = k! (np)k(l- :r-k If we let n + oo in such a way that np = 1 remains constant, then where we used the fact that Hence, in the limit as n -, oo with np = 1 (and as p = Iln + O), Thus, in the case of large n and small p, which indicates that the binomial distribution can be approximated by the Poisson distribution A noisy transmission channel has a per-digit error probability p = (a) Calculate the probability of more than one error in 10 received digits. (b) Repeat (a), using the Poisson approximation Eq. (2.100). (a) It is clear that the number of errors in 10 received digits is a binomial r.v. X with parameters (n, p) = (10,0.01). Then, using Eq. (2.36), we obtain (b) Using Eq. (2.100) with 1 = np = 1q0.01) = 0.1, we have The number of telephone calls arriving at a switchboard during any 10-minute period is known to be a Poisson r.v. X with A = 2.

77 CHAP. 21 RANDOM VARIABLES (a) Find the probability that more than three calls will arrive during any 10-minute period. (b) Find the probability that no calls will arrive during any 10-minute period. (a) From Eq. (2.40), the pmf of X is Thus, (b) P(X=0)=pdO)=e-2x k P(X > 3) = -P(X I3) = 1 - e-2 - k=o k! = 1 - e-2( ) z Consider the experiment of throwing a pair of fair dice. (a) Find the probability that it will take less than six tosses to throw a 7. (b) Find the probability that it will take more than six tosses to throw a 7. (a) From Prob. 1.31(a), we see that the probability of throwing a 7 on any toss is 4. Let X denote the number of tosses required for the first success of throwing a 7. Then, from Prob. 2.15, it is clear that X is a geometric r.v. with parameter p = 6. Thus, using Eq. (2.71) of Prob. 2.15, we obtain (b) Similarly, we get P(X > 6) = 1 - P(X 5 6) = 1 - FA6) = 1 - [l - (2)6] = (:)6 w Consider the experiment of rolling a fair die. Find the average number of rolls required in order to obtain a 6. Let X denote the number of trials (rolls) required until the number 6 first appears. Then X is a geometrical r.v. with parameter p = 4. From Eq. (2.85) of Prob. 2.27, the mean of X is given by Thus, the average number of rolls required in order to obtain a 6 is Assume that the length of a phone call in minutes is an exponential r.v. X with parameter I = $. If someone arrives at a phone booth just before you arrive, find the probability that you will have to wait (a) less than 5 minutes, and (b) between 5 and 10 minutes. (a) From Eq. (2.48), the pdf of X is Then (b) Similarly,

78 70 RANDOM VARIABLES [CHAP All manufactured devices and machines fail to work sooner or later. Suppose that the failure rate is constant and the time to failure (in hours) is an exponential r.v. X with parameter A. Measurements show that the probability that the time to failure for computer memory chips in a given class exceeds lo4 hours is e- ' (20.368). Calculate the value of the parameter I. Using the value of the parameter A determined in part (a), calculate the time x, such that the probability that the time to failure is less than x, is From Eq. (2.49), the cdf of X is given by Now from which we obtain 1 = We want from which we obtain x, = - lo4 ln (0.95) = 51 3 hours A production line manufactures 1000-ohm (R) resistors that have 10 percent tolerance. Let X denote the resistance of a resistor. Assuming that X is a normal r.v. with mean 1000 and variance 2500, find the probability that a resistor picked at random will be rejected. Let A be the event that a resistor is rejected. Then A = {X < 900) u {X > 1100). Since (X < 900) n {X > 1100) = (21, we have Since X is a normal r.v. with p = 1000 and a2 = 2500 (a = 50), by Eq. (2.55) and Table A (Appendix A), Fx(900) ~~ooo) 2) = F,(1100) ( 5o ) = Thus, P(A) = 2[1 z The radial miss distance [in meters (m)] of the landing point of a parachuting sky diver from the center of the target area is known to be a Rayleigh r.v. X with parameter a2 = 100. (a) Find the probability that the sky diver will land within a radius of 10 m from the center of the target area. (b) Find the radius r such that the probability that X > r is e- ( x 0.368). (a) Using Eq. (2.75) of Prob. 2.23, we obtain (b) Now from which we obtain r2 = 200 and r = $66 P(X > r) = 1 - P(X < r) = 1 - F,(r) (1 - e-r2/200) = e-r2/200 = e-l = rn.

79 CHAP. 21 RANDOM VARIABLES CONDITIONAL DISTRIBUTIONS Let X be a Poisson r.v. with parameter 2. Find the conditional pmf of X given B = (X is even). From Eq. (2.40), the pdf of X is Then the probability of event B is ;Ik px(k) = e-a - k = 0, 1,... k! Let A = {X is odd). Then the probability of event A is Now f e-ak!=eu- 1 E 7- - k=odd a, Ak Ak (-A)k-,-A -A,-21 e - k = even k = 0 Hence, adding Eqs. (2.101) and (2.1 02), we obtain Now, by Eq. (2.62), the pmf of X given B is If k is even, (X = k) c B and (X = k) n B = (X = k). If k is odd, (X = k) n B = fzi. Hence, P*(k I B) = P(X = k) P(B) 2e-9'' (1+eT2")k! k even k odd Show that the conditional cdf and pdf of X given the event B = (a < X I b) are as follows: 10 x 5: a

80 RANDOM VARIABLES [CHAP 2 Substituting B = (a < X 5 b) in Eq. (2.59), we have P((X I x) n (a < X I b)} FX(x(a<XIb)=P(X<xlac~Ib)= P(a < X I b) Now Hence, xla By Eq. (2.63), the conditional pdf of X given a < X I b is obtained by differentiating Eq. (2.104) with respect to x. Thus, Fx(x Recall the parachuting sky diver problem (Prob. 2.48). Find the probability of the sky diver landing within a 10-m radius from the center of the target area given that the landing is within 50 m from the center of the target area. From Eq. (2.75) (Prob. 2.23) with a2 = 100, we have Setting x = 10 and b = 50 and a = - cc in Eq. (2.104), we obtain Let X = N(0; 02). Find E(X I X > 0) and Var(X 1 X > O), From Eq. (2.52), the pdf of X = N(0; a2) is Then by Eq. (2.105), Hence,

81 CHAP. 21 RANDOM VARIABLES Let y = x2/(2a2). Then dy = x dx/a2, and we get Next, Then by Eq. (2.31), we obtain 1 " fi, -" = - I x2e-x2/(2a2) dx = = 2. (2.108) A r.v. X is said to be without memory, or memoryless, if P(X~x+tlX>t)=P(Xsx) x,t>o Show that if X is a nonnegative continuous r.v. which is memoryless, then X must be an exponential r.v. By Eq. (1.39), the memoryless condition (2.1 10) is equivalent to If X is a nonnegative continuous r.v., then Eq. (2.1 11) becomes or [by Eq. (2.2511, Fx(x + t) - Fx(t) = CFXW - FX(0)ICl - FX(O1 Noting that Fx(0) = 0 and rearranging the above equation, we get Taking the limit as t -+ 0, we obtain F W = F>(O)[l - Fx(x)l where FX(x) denotes the derivative of FX(x). Let RX(x) = 1 - FX(x) Then Eq. (2.112) becomes The solution to this differential equation is given by Rx(x) = kerx(oix where k is an integration constant. Noting that k = Rx(0) = 1 and letting RgO) = - FXO) = -fdo) = - 1, we obtain Rx(x) = e -lx

82 RANDOM VARIABLES [CHAP 2 and hence by Eq. (2.1 13), Thus, by Eq. (2.49), we conclude that X is an exponential r.v. with parameter 1 = fx(0) (>0). Note that the memoryless property Eq. (2.1 10) is also known as the Markov property (see Chap. 5), and it may be equivalently expressed as Let X be the lifetime (in hours) of a component. Then Eq. (2.114) states that the probability that the component will operate for at least x + t hours given that it has been operational for t hours is the same as the initial probability that it will operate for at least x hours. In other words, the component "forgets" how long it has been operating. Note that Eq. (2.115) is satisfied when X is an exponential rev., since P(X > x) = 1 - FAX) = e-" and e-a(x+t) = e-ki -At e. Supplementary Problems Consider the experiment of tossing a coin. Heads appear about once out of every three tosses. If this experiment is repeated, what is the probability of the event that heads appear exactly twice during the first five tosses? Ans Consider the experiment of tossing a fair coin three times (Prob. 1.1). Let X be the r.v. that counts the number of heads in each sample point. Find the following probabilities: (a) P(X I 1); (b) P(X > 1); and (c) P(0 < X < 3) Consider the experiment of throwing two fair dice (Prob. 1.31). Let X be the r.v. indicating the sum of the numbers that appear. (a) What is the range of X? (b) Find (i) P(X = 3); (ii) P(X 5 4); and (iii) P(3 < X 1 7). Ans. (a) Rx = (2, 3,4,..., 12) (b) (i) & ; (ii) 4 ; (iii) Let X denote the number of heads obtained in the flipping of a fair coin twice. (a) Find the pmf of X. (b) Compute the mean and the variance of X Consider the discrete r.v. X that has the pmf Let A = (c: X({) = 1, 3, 5, 7,...}. Find P(A). Ans. 3 px(xk) = (JP xk = 1, 2, 3,...

83 CHAP. 21 RANDOM VARIABLES Consider the function given by (0 otherwise where k is a constant. Find the value of k such that p(x) can be the pmf of a discrete r.v. X. Ans. k = 6/n2 It is known that the floppy disks produced by company A will be defective with probability The company sells the disks in packages of 10 and offers a guarantee of replacement that at most 1 of the 10 disks is defective. Find the probability that a package purchased will have to be replaced. Ans Given that X is a Poisson r.v. and px(0) = , compute E(X) and P(X 2 3). Ans. E(X) = 3, P(X 2 3) = A digital transmission system has an error probability of per digit. Find the probability of three or more errors in lo6 digits by using the Poisson distribution approximation. Ans Show that the pmf px(x) of a Poisson r.v. X with parameter 1 satisfies the following recursion formula: Hint: Use Eq. (2.40). The continuous r.v. X has the pdf where k is a constant. Find the value of k and the cdf of X Ans. k = 6; FX(x) = x10 - x2) 0 < x < 1 otherwise The continuous r.v. X has the pdf where k is a constant. Find the value of k and P(X > 1). Ans. k=j;~(x > 1 )= A r.v. X is defined by the cdf - x2) 0 < x < 2 otherwise (a) Find the value of k. (b) Find the type of X. (c) Find (i) P(4 < X I 1); (ii) P($ < X < 1); and (iii) P(X > 2).

84 RANDOM VARIABLES [CHAP 2 Ans. (a) k = 1. (b) Mixed r.v. (c) (i) $; (ii) ; (iii) It is known that the time (in hours) between consecutive traffic accidents can be described by the exponential r.v. X with parameter 1 = &. Find (i) P(X I 60); (ii) P(X > 120); and (iii) P(10 < X I 100). Ans. (i) 0.632; (ii) 0.135; (iii) Binary data are transmitted over a noisy communication channel in block of 16 binary digits. The probability that a received digit is in error as a result of channel noise is Assume that the errors occurring in various digit positions within a block are independent. (a) Find the mean and the variance of the number of errors per block. (b) Find the probability that the number of errors per block is greater than or equal to 4. Ans. (a) E(X) = 0.16, Var(X) = (b) x Let the continuous r.v. X denote the weight (in pounds) of a package. The range of weight of packages is between 45 and 60 pounds. (a) Determine the probability that a package weighs more than 50 pounds. (b) Find the mean and the variance of the weight of packages. Hint: Assume that X is uniformly distributed over (45, 60). Ans. (a) 4; (b) E(X) = 52.5, Var(X) = In the manufacturing of computer memory chips, company A produces one defective chip for every nine good chips. Let X be time to failure (in months) of chips. It is known that X is an exponential r.v. with parameter 1 = f for a defective chip and A = with a good chip. Find the probability that a chip purchased randomly will fail before (a) six months of use; and (b) one year of use. Ans. (a) 0.501; (b) The median of a continuous r.v. X is the value of x = x, such that P(X 2 x,) = P(X I x,). The mode of X is the value of x = x, at which the pdf of X achieves its maximum value. (a) Find the median and mode of an exponential r.v. X with parameter 1. (b) Find the median and mode of a normal r.v. X = N(p, a2). Ans. (a) x, = (In 2)/1 = , x, = 0 (b) x, = x, = p Let the r.v. X denote the number of defective components in a random sample of n components, chosen without replacement from a total of N components, r of which are defective. The r.v. X is known as the hypergeometric r.v. with parameters (N, r, n). (a) Find the prnf of X. (b) Find the mean and variance of X. Hint: To find E(X), note that () = x ( x-1 - ) and (:) = (L)(~ - r, To find Var(X), first find E[X(X - I)]. x=o n-x

85 CHAP. 21 RANDOM VARIABLES A lot consisting of 100 fuses is inspected by the following procedure: Five fuses are selected randomly, and if all five "blow" at the specified amperage, the lot is accepted. Suppose that the lot contains 10 defective fuses. Find the probability of accepting the lot. Hint: Ans Let X be a r.v. equal to the number of defective fuses in the sample of 5 and use the result of Prob Consider the experiment of observing a sequence of Bernoulli trials until exactly r successes occur. Let the r.v. X denote the number of trials needed to observe the rth success. The r-v. X is known as the negative binomial r.v. with parameter p, where p is the probability of a success at each trial. (a) Find the pmf of X. (b) Find the mean and variance of X. Hint: To find E(X), use Maclaurin's series expansions of the negative binomial h(q) = (1-9)-' and its derivatives h'(q) and hw(q), and note that To find Var(X), first find E[(X - r)(x - r - 1)] using hu(q). Ans. (a) px(x) = x = r, r + 1,... r(l - P) (b) EIX) = r(i), Var(X) = - p Suppose the probability that a bit transmitted through a digital communication channel and received in error is 0.1. Assuming that the transmissions are independent events, find the probability that the third error occurs at the 10th bit. Ans A r.v. X is called a Laplace r.v. if its pdf is given by where k is a constant. fx(x)=ke-'ix1 (a) Find the value of k. (b) Find the cdf of X. (c) Find the mean and the variance of X. 1>0, -co<x<oo A r.v. X is called a Cauchy r.v. if its pdf is given by

86 RANDOM VARIABLES [CHAP 2 where a ( > 0) and k are constants. (a) Find the value of k. (b) Find the cdf of X. (c) Find the mean and the variance of X. Ans. (a) k = a/z (b) F,(x) = (c) E(X) = 0, Var(X) does not exist.

87 Chapter 3 Multiple Random Variables 3.1 INTRODUCTION In many applications it is important to study two or more r.v.'s defined on the same sample space. In this chapter, we first consider the case of two r.v.'s, their associated distribution, and some properties, such as independence of the r.v.'s. These concepts are then extended to the case of many r.v.'s defined on the same sample space BIVARIATE RANDOM VARIABLES A. Definition: Let S be the sample space of a random experiment. Let X and Y be two r.v.'s. Then the pair (X, Y) is called a bivariate r.v. (or two-dimensional random vector) if each of X and Y associates a real number with every element of S. Thus, the bivariate r.v. (X, Y) can be considered as a function that to each point c in S assigns a point (x, y) in the plane (Fig. 3-1). The range space of the bivariate r.v. (X, Y) is denoted by R,, and defined by If the r.v.'s X and Y are each, by themselves, discrete r.v.'s, then (X, Y) is called a discrete bivariate r.v. Similarly, if X and Y are each, by themselves, continuous r.v.'s, then (X, Y) is called a continuous bivariate r.v. If one of X and Y is discrete while the other is continuous, then (X, Y) is called a mixed bivariate r.v. Fig. 3-1 (X, Y) as a function from S to the plane. 79

88 80 MULTIPLE RANDOM VARIABLES [CHAP JOINT DISTRIBUTION FUNCTIONS A. Definition: The joint cumulative distribution function (or joint cdf) of X and Y, denoted by FXy(x, y), is the function defined by Fxy(x, y) = P(X I x, Y _< y) (3.1) The event (X 5 x, Y I y) in Eq. (3.1) is equivalent to the event A n B, where A and B are events of S defined by A = {c E S; X(c) _< x) and B = (5 E S; Y(() I y) (3-2) and 44 = Fx(x) P(B) = FY(Y) Thus, F X Y Y) ~ = P(A n B) (3.3) If, for particular values of x and y, A and B were independent events of S, then by Eq. (l.46), FXYk y) = P(A n B) = W WB) = Fx(x)FAy) B. Independent Random Variables: Two r.v.'s X and Y will be called independent if Fxy(x9 Y) = FX(X)FY(Y) for every value of x and y. C. Properties of F&, y): The joint cdf of two r.v.'s has many properties analogous to those of the cdf of a single r.v. 0 I Fxy(x, y) < 1 If x1 I x,, and y1 I y2, then lirn FXy(x, y) = Fxy(w, GO) = 1 x-'m Y+W (3.5) FxY~, Y 1) 5 Fxy(x2, Y 1 ) 5 Fxy(x2, Y2) (3.6~) F X Y YJ ~ < Fxy(x1, Y2) l Fxy(x2 9 Y2) (3.6b) lim FXy(x, y) = Fxy(- a, y) = 0 (3.8~) X+ - W lim FXy(x, y) = FXy(x, - co) = 0 (3.8 b) y---co lim FXY(X, Y) = Fxy(af, Y) = Fxy(a, Y) (3.9~) x+a+ lim FXy(x, y) = FXy(x, b +) = FXy(x, b) (3.9b) y-+b+ P(x1 < X I x2, Y 5 Y) = Fxy(x2, YbFxy(x1, Y) (3.10) P(X 5 x, YI < Y 5 Y2) = F X Y Y2) ~ - FXY(X, YI) (3.11) If x, I x, and y, I y,, then Fxy(x2 7 Y2) - FxY(% Y2) - FXY(X~ 3 Yd + F X Y Yl) ~ 2 0 (3.12) Note that the left-hand side of Eq. (3.12) is equal to P(xl < X 5 x2, y, < Y I y,) (Prob. 3.5). (3.7)

89 CHAP. 31 MULTIPLE RANDOM VARIABLES D. Marginal Distribution Functions: Now since the condition y 5 oo is always satisfied. Then Similarly, lim FXY~ Y) = FxY@, Y) = FY(Y) (3.14) X+ 00 The cdf's FX(x) and F,(y), when obtained by Eqs. (3.13) and (3.14), are referred to as the marginal cdf's of X and Y, respectively. 3.4 DISCRETE RANDOM VARIABLES-JOINT PROBABILITY MASS FUNCTIONS A. Joint Probability Mass Functions : Let (X, Y) be a discrete bivariate r.v., and let (X, Y) take on the values (xi, yj) for a certain allowable set of integers i and j. Let ~xr(xi.yj) = P(X = xi, Y = yj) (3.15) The function pxy(xi, yj) is called the joint probability mass function (joint pmf) of (X, Y). B. Properties of p&,, y,) : where the summation is over the points (xi, yj) in the range space RA corresponding to the event A. The joint cdf of a discrete bivariate r.v. (X, Y) is given by C. Marginal Probability Mass Functions: Suppose that for a fixed value X = xi, the r.v. Y can take on only the possible values yj (j = 1, 2,..., n). Then where the summation is taken over all possible pairs (xi, yj) with xi fixed. Similarly, where the summation is taken over all possible pairs (xi, yj) with yj fixed. The pmf's paxi) and pdyj), when obtained by Eqs. (3.20) and (3.21), are referred to as the marginal pmf's of X and Y, respectively. D. Independent Random Variables: If X and Y are independent r.v.'s, then (Prob. 3.10)

90 82 MULTIPLE RANDOM VARIABLES [CHAP CONTINUOUS RANDOM VARIABLES-JOINT PROBABILITY DENSITY FUNCTIONS A. Joint Probability Density Functions: Let (X, Y) be a continuous bivariate r.v. with cdf FXdx, y) and let The function fxy(x, y) is called the joint probability density function (joint pdf) of (X, Y). By integrating Eq. (3.23), we have B. Properties of f,(x, y): 3. f,ax, y) is continuous for all values of x or y except possibly a finite set. Since P(X = a) = 0 = P(Y = c) [by Eq. (2.19)], it follows that C. Marginal Probability Density Functions: By Eq. (3.13)' Hence or Similarly, The pdf's fdx) and fdy), when obtained by Eqs. (3.30) and (3.31), are referred to as the marginal pdf's of X and Y, respectively.

91 CHAP. 33 MULTIPLE RANDOM VARIABLES D. Independent Random Variables : If X and Y are independent r.v.'s, by Eq. (3.4), Fxy(x, Y) = Fx(x)Fy(y) Then analogous with Eq. (3.22) for the discrete case. Thus, we say that the continuous r.v.'s X and Y are independent r.v.'s if and only if Eq. (3.32) is satisfied. 3.6 CONDITIONAL DISTRIBUTIONS A. Conditional Probability Mass Functions: If (X, Y) is a discrete bivariate r.v. with joint pmf pxdxi, yj), then the conditional pmf of Y, given that X = xi, is defined by Similarly, we can define pxly(xi I yj) as B. Properties ofpylhj [xi): 1. 0 I pylx(yj 1 xi) PY~X(Y~ l xi) = 1 yi Notice that if X and Y are independent, then by Eq. (3.22), C. Conditional Probability Density Functions: If (X, Y) is a continuous bvivariate r.v. with joint pdf fxy(x, y), then the conditional pdf of Y, given that X = x, is defined by Similarly, we can define fxly(x I y) as D. Properties of fy, &J 1 x):

92 84 MULTIPLE RANDOM VARIABLES [CHAP 3 As in the discrete case, if X and Y are independent, then by Eq. (3.32), ~YIX(Y I X) = fy(y) and fxly(x I Y) = fx(x) 3.7 COVARIANCE AND CORRELATION COEFFICIENT The (k, n)th moment of a bivariate r.v. (X, Y) is defined by (discrete case) mkn = E(xkYn) = (3.43) xkyn fxy(x, y) dx dy (continuous case) If n = 0, we obtain the kth moment of X, and if k = 0, we obtain the nth moment of Y. Thus, m,, = E(X) = px and m,, = E(Y) = py (3.44) If (X, Y) is a discrete bivariate r.v., then using Eqs. (3.43), (3.20), and (3.21), we. obtain Similarly, we have E(x2) = 1xi2~xAxi 9 Yj) = 1 xi2paxi) Yj Xi E(Y2) = z 1 Y~PXAX~ Xi Yj Xi Yj 9 Yj) = C Y?P~Y~) If (X, Y) is a continuous bivariate rev., then using Eqs. (3.43), (3.30), and (3.31), we obtain Similarly, we have

93 CHAP. 3) MULTIPLE RANDOM VARIABLES The variances of X and Y are easily obtained by using Eq. (2.31). The (1, 1)th joint moment of (X, Y), m,, = E(XY) (3.49) is called the correlation of X and Y. If E(XY) = 0, then we say that X and Y are orthogonal. The covariance of X and Y, denoted by Cov(X, Y) or ax,, is defined by Expanding Eq. (3.50), we obtain If Cov(X, Y) = 0, then we say that X and Y are uncorrelated. From Eq. (3.51), we see that X and Y are uncorrelated if Note that if X and Y are independent, then it can be shown that they are uncorrelated (Prob. 3.32), but the converse is not true in general; that is, the fact that X and Y are uncorrelated does not, in general, imply that they are independent (Probs. 3.33, 3.34, and 3.38). The correlation coefficient, denoted by p(x, Y) or pxy, is defined by It can be shown that (Prob. 3.36) Note that the correlation coefficient of X and Y is a measure of linear dependence between X and Y (see Prob. 4.40). 3.8 CONDITIONAL MEANS AND CONDITIONAL VARIANCES If (X, Y) is a discrete bivariate r.v. with joint pmf pxy(xi, y,), then the conditional mean (or conditional expectation) of Y, given that X = xi, is defined by The conditional variance of Y, given that X = xi, is defined by which can be reduced to Yi The conditional mean of X, given that Y = y,, and the conditional variance of X, given that Y = y,, are given by similar expressions. Note that the conditional mean of Y, given that X = xi, is a function of xi alone. Similarly, the conditional mean of X, given that Y = yj, is a function of yj alone. If (X, Y) is a continuous bivariate rev. with joint pdf fxy(x, y), the conditional mean of Y, given that X = x, is defined by The conditional variance of Y, given that X = x, is defined by

94 86 MULTIPLE RANDOM VARIABLES [CHAP 3 which can be reduced to The conditional mean of X, given that Y = y, and the conditional variance of X, given that Y = y, are given by similar expressions. Note that the conditional mean of Y, given that X = x, is a function of x alone. Similarly, the conditional mean of X, given that Y = y, is a function of y alone (Prob. 3.40). 3.9 N-VARIATE RANDOM VARIABLES In previous sections, the extension from one r.v. to two r.v.'s has been made. The concepts can be extended easily to any number of r.v.'s defined on the same sample space. In this section we briefly describe some of the extensions. A. Definitions: Given an experiment, the n-tuple of r.v.'s (XI, X,,..., X,) is called an n-variate r.v. (or n- dimensional random vector) if each Xi, i = 1, 2,..., n, associates a real number with every sample point E S. Thus, an n-variate r.v. is simply a rule associating an n-tuple of real numbers with every y E S. Let (XI,..., X,) be an n-variate r.v. on S. Then its joint cdf is defined as Note that The marginal joint cdf's are obtained by setting the appropriate Xi's to +GO example, in Eq. (3.61). For A discrete n-variate r.v. will be described by a joint pmf defined by pxl... xn(~l,..., x,) = P(X = x..., X, = x,) (3.65) The probability of any n-dimensional event A is found by summing Eq. (3.65) over the points in the n-dimensional range space RA corresponding to the event A : Properties of p,,... (x,,..., x,,) : The marginal pmf's of one or more of the r.v.'s are obtained by summing Eq. (3.65) appropriately. For example,

95 CHAP. 31 MULTIPLE RANDOM VARIABLES Conditional pmf's are defined similarly. For example, A continuous n-variate r.v. will be described by a joint pdf defined by Then and Properties of fx,..&(xi,.., x,) : The marginal pdf's of one or more of the r.v.'s are obtained by integrating Eq. (3.72) appropriately. For example, Conditional pdf's are defined similarly. For example, The r.v.'s XI,..., Xn are said to be mutually independent if for the discrete case, and for the continuous case. The mean (or expectation) of Xi in (XI,..., Xn) is defined as C. 1 xipxl..xn(xl,.--, xn) (discrete case) xi fx1... X,(~l,..., x,) dxl. dxn (continuous case) (3.82) The variance of Xi is defined as

96 88 MULTIPLE RANDOM VARIABLES [CHAP 3 The covariance of Xi and Xj is defined as oij = Cov(Xi, Xi) = E[(Xi - pi)(xj - pj)] The correlation coefficient of Xi and Xj is defined as Cov(Xi,Xj) V 0.. Pij - = oi oj ai oj 3.10 SPECIAL DISTRIBUTIONS A. Multinomial Distribution: The multinomial distribution is an extension of the binomial distribution. An experiment is termed a multinomial trial with parameters p,, p,,..., p,, if it has the following conditions: 1. The experiment has k possible outcomes that are mutually exclusive and exhaustive, say A,, A,,..., Ak. 2. P(Ai) = pi i = 1,..., k and k 1 pi = 1 (3.86) i= 1 Consider an experiment which consists of n repeated, independent, multinomial trials with parameters p,, p,,..., p,. Let Xi be the r.v. denoting the number of trials which result in Ai. Then (X,, X,,..., X& is called the multinomial r.v. with parameters (n, p,, p,,..., pk) and its pmf is given by (Prob. 3.46) k for xi = 0, 1,..., n, i = 1,..., k, such that xi = n. i= 1 Note that when k = 2, the multinomial distribution reduces to the binomial distribution. B. Bivariate Normal Distribution: A bivariate r.v. (X, Y) is said to be a bivariate normal (or gaussian) r.v. if its joint pdf is given by where (x,) = [ (- 2 P ( 3 ( 7 ) + (yy] and A, py, ox2, oy2 are the means and variances of X and Y, respectively. It can be shown that p is the correlation coefficient of X and Y (Prob. 3.50) and that X and Y are independent when p = 0 (Prob. 3.49). C. N-variate Normal Distribution: Let (XI,..., X,) be an n-variate r.v. defined on a sample space S. Let X be an n-dimensional random vector expressed as an n x 1 matrix:

97 CHAP. 31 MULTIPLE RANDOM VARIABLES Let x be an n-dimensional vector (n x 1 matrix) defined by X = [;I] The n-variate r.v. (XI,..., Xn) is called an n-variate normal r.v. if its joint pdf is given by xn where T denotes the "transpose," p is the vector mean, K is the covariance matrix given by and det K is the determinant of the matrix K. Note that f,(x) stands for f,,... L(xl,..., xn). Solved Problems BIVARIATE RANDOM VARIABLES AND JOINT DISTRIBUTION FUNCTIONS 3.1. Consider an experiment of tossing a fair coin twice. Let (X, Y) be a bivariate r.v., where X is the number of heads that occurs in the two tosses and Y is the number of tails that occurs in the two tosses. (a) What is the range Rx of X? (b) What is the range Ry of Y? (c) Find and sketch the range Rxy of (X, Y). (d) Find P(X = 2, Y = 0), P(X = 0, Y = 2), and P(X = 1, Y = 1). The sample space S of the experiment is S = {HH, HT, TH, TT) (a) R, = (0, 1,2) (b) R, = (0, 192) (c) RXY = ((2, O), (1, I), (0, 2)) which is sketched in Fig (d) Since the coin is fair, we have P(X = 2, Y = 0) = P(HH} = $ P(X = 0, Y = 2) = P{TT) = 4 P(X= 1, Y = 1)= P{HT, TH} = 3.2. Consider a bivariate r.v. (X, Y). Find the region of the xy plane corresponding to the events A={X+Y12) B = {x2 + Y2 < 4) C = {min(x, Y) 12) D = (max(x, Y) 12)

98 MULTIPLE RANDOM VARIABLES [CHAP 3 Fig. 3-2 (c) (4 Fig. 3-3 Regions corresponding to certain joint events.

99 CHAP. 31 MULTIPLE RANDOM VARIABLES 9 1 The region corresponding to event A is expressed by x + y :I 2, which is shown in Fig. 3-3(a), that is, the region below and including the straight line x + y = 2. The region corresponding to event B is expressed by x2 + y2 < 2,, which is shown in Fig. 3-3(b), that is, the region within the circle with its center at the origin and radius 2. The region corresponding to event C is shown in Fig. 3-3(c), which is found by noting that The region corresponding to event D is shown in Fig. 3-3(d), which is found by noting that 3.3. Verify Eqs. (3.7), (3.8a), and (3.8 b). Since {X I co, Y I co) = S and by Eq. (1.22), Next, as we know, from Eq. (2.8), P(X I a, Y I a ) = FXy(a, a) = P(S) = 1 P(X = -Go)= P(Y = -co)=o Since (X= - a,y~y)c(x=-a) and (XIX,YI-a)c(Y=-co) and by Eq. (1.D), we have P(X = - a, Y I y) = Fxy(-m, y) = 0 P(X I X, Y = - m) = FxY(x, -- m) = Verify Eqs. (3.1 0) and (3.1 1). Clearly (XSx,, Y IY)=(XSX,, Y ~ y u )(x, <XIX,, Y ~ y ) The two events on the right-hand side are disjoint; hence by Eq. (1.D), P(X r x,, Y I y) = P(X I x,, Y I y) + P(x, < X I x,, Y 5 y) or P(x1 < X5x2, YSy)= P(X IX,, Y ~ y ) P(X - IX,, Y ~ y ) Similarly, = Fxy(x2 9 Y) - Fxy(x1, Y) (X I x, y 5 y,) = (X I x, Y I y,) u (X- I x, y, < Y < y,) Again the two events on the right-hand side are disjoint, hence or P(X I x, Y I y,) = P(X I x, Y I y,) + P(X I x, y, < Y I y,) P(X 5 x, y, < Y I y,) = P(X 2 x, Y I y,) - P(X I x, Y S y,) = Fx,(x, Y,) - FXY~ Y,) 3.5. Verify Eq. (3.I 2). Clearly (x1 <XIx,, Y<y,)=(x, <XIx,, YIy,)v(x, <XIx,,y, < Y5y2) The two events on the right-hand side are disjoint; hence P(xl<XIx,, Ysy2)=P(xl <X<x2, Y<yl).f P(x1 <X<x,,y, < Ysy,) Then using Eq. (3.1 O), we obtain

100 MULTIPLE RANDOM VARIABLES [CHAP 3 Since the probability must be nonnegative, we conclude that ifx2 2 x, andy, 2 y,. Fxdx2, Y2) - Fxyix,, ~ 2) - Fxdx2 9 Y,) + Fxdx,, Y,) Consider a function i 1 - e-(x+~) Olx<co,O~y<oo F(x9 Y) = otherwise Can this function be a joint cdf of a bivariate r.v. (X, Y)? It is clear that F(x, y) satisfies properties 1 to 5 of a cdf [Eqs. (3.5) to ( But substituting F(x, y) in Eq. (3.12) and setting x, = y, = 2 and x, = y, = 1, we get Thus, property 7 [Eq. (3.12)] is not satisfied. Hence F(x, y) cannot be a joint cdf. Consider a bivariate r.v. (X, Y). Show that if X and Y are independent, then every event of the form (a < X 5 b) is independent of every event of the form (c < Y I d). By definition (3.4), if X and Y are independent, we have Setting x, = a, x, = b, y, = c, and y2 = d in Eq. (3.95) (Prob. 3.9, we obtain which indicates that event (a < X I b) and event (c < Y I d) are independent [Eq. (1, The joint cdf of a bivariate r.v. (X, Y) is given by i Fx& (l-e-"")(l-e-by) Y) = () ~20,~20,a,B>O otherwise (a) Find the marginal cdf's of X and Y. (b) Show that X and Y are independent. (c) Find P(X 5 1, Y 5 I), P(X 2 I), P(Y > I), and P(X > x, Y > y). (a) By Eqs. (3.13) and (3.14), the marginal cdf's of X and Y are (b) Since FX-(x, y) = FX(x)Fy(y), X and Y are independent. (c) P(X I 1, Y I 1) = Fxr(l, 1) = (1 - e-")(l - e-8) P(X 5 1) = Fx(l) = (1 - e-") P(Y > 1) = 1 - P(Y I 1) = 1 - Fd1) = e-@

101 CHAP. 31 MULTIPLE RANDOM VARIA.BLES By De Morgan's law (1.15), we have - -- (X > x) n (Y > y) = (X > x) u (Y > y) = (X I x) u (Y 5 y) Then by Eq. (1 D), Finally, by Eq. (1.25), we obtain P[(X > X) n (Y > y)] = P(X I x) + P(Y 5; y) - P(X s x, Y s y) = Fx(4 + FAY) - F,,(x, Y) = (1 - e-"") + (1 - e-fly) - (1 - e-ax)(l - e-fly) = 1 - e-axe-by P(X > x, Y > y) = 1 - P[(X > x) n (Y > y)] = e-""e-dy 3.9. The joint cdf of a bivariate r.v. (X, Y) is given by Find the marginal cdf's of X and Y. Find the conditions on p,, p,, and p, for which X and Y are independent. By Eq. (3.13), the marginal cdf of X is given by By Eq. (3.1 4), the marginal cdf of Y is given by For X and Y to be independent, by Eq. (3.4), we must have FXy(x, y) = FX(x)Fr(y). Thus, for 0 x < a, 0 I y < b, we must have p, = p, p, for X and Y to be independent. DISCRETE BIVARIATE RANDOM VARIABLESJOINT PROBABILITY MASS FUNCTIONS Verify Eq. (3.22). If X and Y are independent, then by Eq. (1.46), pxdxi, yj) = P(X = Xi,Y = yi) = P(X = xi)p(y = yj) = P~(X~)P~(Y~) Two fair dice are thrown. Consider a bivariate r.v. (X, Y). Let X = 0 or 1 according to whether the first die shows an even number or an odd number of dots. Similarly, let Y = 0 or 1 according to the second die. (a) Find the range Rxy of (X, Y). (b) Find the joint pmf's of (X, Y).

102 MULTIPLE RANDOM VARIABLES [CHAP 3 (b) It is clear that X and Y are independent and P(X = 0) = P(X = 1) = 4 = q P(Y=O)=P(Y= I)=$=$ Thus pxy(i,j)=p(x=i, Y=j)=P(X=i)P(Y =j)=$ i,j=o, Consider the binary communication channel shown in Fig. 3-4 (Prob. 1.52). Let (X, Y) be a bivariate r.v., where X is the input to the channel and Y is the output of the channel. Let P(X = 0) = 0.5, P(Y = 11 X = 0) = 0.1, and P(Y = 0 1 X = 1)= 0.2. Find the joint pmf's of (X, Y). Find the marginal pmf's of X and Y. Are X and Y independent? From the results of Prob. 1.52, we found that Then by Eq. (1.41), we obtain Hence, the joint pmf's of (X, Y) are By Eq. (3.20), the marginal pmf's of X are By Eq. (3.21), the marginal pmf's of Y are px(0) = pxu(o, yj) = = 0.5 YJ px(l) = C pxr(l, yj) = = 0.5 YJ Fig. 3-4 Binary communication channel.

103 CHAP. 31 MULTIPLE RANDOM VARIABLES 95 (c) Now Hence X and Y are not independent. p,(0)py(o) = OS(0.55) = # pxy(o, 0) = 0.45 Consider an experiment of drawing randomly three balls from an urn containing two red, three white, and four blue balls. Let (X, Y) be a bivariate r.v. where X and Y denote, respectively, the number of red and white balls chosen. Find the range of (X, Y). Find the joint pmf's of (X, Y). Find the marginai pmf's of X and Y. Are X and Y independent? The range of (X, Y) is given by The joint pmf's of (X, Y) are given as follows: Rxr = {(O, 019 (0, I), (0, 2), (0, 3)9 (19 01, (1, (1, 3, (2, O), (2, 1)) pxy(i,j)= P(X=i, Y =j) i=o, 1,2 j=o, 1,2,3 which are expressed in tabular form as in Table 3.1. The marginal pmf's of X are obtained from Table 3.1 by computing the row sums, and the marginal pmf's of Y are obtained by computing the column sums. Thus Table 3.1 p&i, j)

104 MULTIPLE RANDOM VARIABLES [CHAP 3 (d) Since X and Y are not independent. PXY(0, 0) = 6 Z PX(O)PY(O) = ie (%I The joint pmf of a bivariate r.v. (X, Y) is given by where k is a constant. (a) Find the value of k. (b) Find the marginal pmf's of X and Y. (c) Are X and Y independent? k(2xi + yj) xi = 1, 2; yj = 1, 2 otherwise Thus, k = &. (b) By Eq. (3.20), the marginal pmf's of X are By Eq. (3.21), the marginal pmf's of Y are (c) Now paxi)py(yj) # pxy(xi, y,); hence X and Y are not independent The joint pmf of a bivariate r.v. (X, Y) is given by where k is a constant. Find the value of k. Find the marginal pmf's of X and Y. Are X and Y independent? By Eq. (3.1 7), kxi2yj xi=l,2;yj=l,2,3 otherwise Thus, k = &. By Eq. (3.20), the marginal pmf's of X are

105 CHAP. 31 MULTIPLE RANDOM VARIABLES By Eq. (3.21), the marginal pmf's of Y are (c) Now Hence X and Y are independent. Px(xi)P~(Yj) = &ixi2yj = PxAxi Yj) Consider an experiment of tossing two coins three times. Coin A is fair, but coin B is not fair, with P(H) = and P(T) = $. Consider a bivariate r.v. (X, Y), where X denotes the number of heads resulting from coin A and Y denotes the number of heads resulting from coin B. (a) Find the range of (X, Y). (b) Find the joint pmf's of (X, Y). (c) Find P(X = Y), P(X > Y), and P(X + Y 1 4). (a) The range of (X, Y) is given by Rxy = {(i, j): i, j = 0, 1, 2, 3) (b) It is clear that the r.v.'s X and Y are independent, and they are both binomial r.v.'s with parameters (n, p) = (3, 4) and (n, p) = (3, $), respectively. Thus, by Eq. (2.36), we have Since X and Y are independent, the joint pmf's of (X, Y) are which are tabulated in Table 3.2. (c) From Table 3.2, we have Table 3.2 pm(i, J)

106 MULTIPLE RANDOM VARIABLES [CHAP 3 Thus, ~ ( ~ + y < 4 ) = 1 - ~ ( ~ + ~ > 4 ) = 1 - & = ~ CONTINUOUS BIVARIATE RANDOM VARIABLES-PROBABILITY FUNCTIONS The joint pdf of a bivariate r.v. (X,, Y) is given by,ax. + = g(. where k is a constant. (a) Find the value of k. (b) Find the marginal pdf's of X and Y. (c) Are X and Y independent? o<x<2,o<y<2 otherwise DENSITY Thus k = $. (b) By Eq. (3.30), the marginal pdf of X is Y = o O<x<2 otherwise Since fxy(x, y) is symmetric with respect to x and y, the marginal pdf of Y is MY)={;(~+l) (c) Since fx,(x, y) # fx(x) fy(y), X and Y are not independent. O<Y<2 otherwise The joint pdf of a bivariate r.v. (X, Y) is given by where k is a constant. (a) Find the value of k. (b) Are X and Y independent? (c) Find P(X + Y < 1). O<x<l,O<y<l otherwise

107 CHAP. 31 MULTIPLE RANDOM VARIABLES Fig. 3-5 (a) The range space Rxy is shown in Fig. 3-5(a). By Eq. (3.26), (b) Thus k = 4. To determine whether X and Y are independent, we must find the marginal pdf's of X and Y. By Eq. (3.30), (c) By symmetry, Since f,,(x, (0 otherwise O<y<l fdy) = {? otherwise y) = fx(x) fjy), X and Y are independent. The region in the xy plane corresponding to the event (X +. Y < 1) is shown in Fig. 3-5(b) as a shaded area. Then The joint pdf of a bivariate r.v. (X, Y) is given by where k is a constant. (a) Find the value of k. (b) Are X and Y independent? dx dy = [4Y(T = l4yml - Y)'] dy = t (a) The range space Rxy is shown in Fig By Eq. (3.26), O<x<y<l otherwise x2 1-Y lo ) dy

108 MULTIPLE RANDOM VARIABLES [CHAP 3 0 Fig. 3-6 (b) Thus k = 8. By Eq. (3.30), the marginal pdf of X is By Eq. (3.31), the marginal pdf of Y is (0 otherwise (0 otherwise Since fx&, y) # fx(x) fy(y), X and Y are not independent. Note that if the range space Rxy depends functionally on x or y, then X and Y cannot be independent r.v.'s The joint pdf of a bivariate r.v. (X, Y) is given by where k is a constant. (a) Determine the value of k. (b) Find the marginal pdf's of X and Y. (c) Find P(0 < X < i,o c Y < i). k O<y<x<l 0 otherwise (a) The range space Rxy is shown in Fig By Eq. (3.26), Thus k = 2. (b) By Eq. (3.30), the marginal pdf of X is By Eq. (3.31)' the marginal pdf of Y is (0 otherwise (0 otherwise

109 CHAP. 31 MULTIPLE RANDOM VARIABLES 101 Fig. 3-7 (c) The region in the xy plane corresponding to the event (0 < X < 3,O < Y < 3) is shown in Fig. 3-7 as the shaded area R,. Then Note that the bivariate r.v. (X, Y) is said to be uniformly distributed over the region Rxy if its pdf is k (x, Y) E RXY fx& Y) = { 0 otherwise where k is a constant. Then by Eq. (3.26), the contant k must be k = l/(area of Rxy) Suppose we select one point at random from within the circle with radius R. If we let the center of the circle denote the origin and define X and Y to be the coordinates of the point chosen (Fig. 3-8), then (X, Y) is a uniform bivariate r.v. with joint pdf given by where k is a constant. (a) Determine the value of k. (b) Find the marginal pdf's of X and Y. (c) Find the probability that the distance from the origin of the point selected is not greater than a. Fig. 3-8

110 MULTIPLE RANDOM VARIABLES [CHAP 3 Thus k = 1/zR2. (b) By Eq. (3.30), the marginal pdf of X is Hence By symmetry, the marginal pdf of Y is (c) For 0 5 a 5 R, I- P The joint pdf of a bivariate r.v. (X, Y) is given by ke-(ax+by) x>o,y>o otherwise where a and b are positive constants and k is a constant. (a) Determine the value of k. (b) Are X and Y independent? Thus k = ab. (b) By Eq. (3.30), the marginal pdf of X is By Eq. (3.31), the marginal pdf of Y is Since fxy(x, y) = fx(x) fr(y), X and Y are independent.

111 CHAP. 31 MULTIPLE RANDOM VARIABLES A manufacturer has been using two different manufacturing processes to make computer memory chips. Let (X, Y) be a bivariate r.v., where X denotes the time to failure of chips made by process A and Y denotes the time to failure of chips rnade by process B. Assuming that the joint pdf of (X, Y) is where a = and b = l.2(10-4), determine P(X > Y). The region in the xy plane corresponding to the event (X > Y) is shown in Fig. 3-9 as the shaded area. Then Fig A smooth-surface table is ruled with equidistant parallel lines a distance D apart. A needle of length L, where L I D, is randomly dropped onto this table. What is the probability that the needle will intersect one of the lines? (This is known as BufJbn's needle problem.) We can determine the needle's position by specifying a bivariate r.v. (X, 0), where X is the distance from the middle point of the needle to the nearest parallel line: and O is the angle from the vertical to the needle (Fig. 3-10). We interpret the statement "the needle is randomly dropped to mean that both X and O have uniform distributions and that X and O are independent. The possible values of X are between 0 and Fig Buffon's needle problem.

112 104 MULTIPLE RANDOM VARIABLES [CHAP 3 012, and the possible values of O are between 0 and 42. Thus, the joint pdf of (X, O) is (0 otherwise From Fig. 3-10, we see that the condition for the needle to intersect a line is X < L/2 cos 8. Thus, the probability that the needle will intersect a line is CONDITIONAL DISTRIBUTIONS Verify Eqs. (3.36) and (3.41). (a) By Eqs. (3.33) and (3.20), (b) Similarly, by Eqs. (3.38) and (3.30), Consider the bivariate r.v. (X, Y) of Prob (a) Find the conditional pmf's fiix(yj I xi) and pxiy(xi ( yj). (b) Find P(Y = 2 1 X = 2) and P(X = 2 1 Y = 2). (a) From the results of Prob. 3.14, we have Thus, by Eqs. (3.33) and (3.34), &(2xi + yj) Xi = 1, 2; yj = 1, 2 otherwise px(xi) = &4xi + 3) xi = 1, 2 P~Y~) = Ad2yj + 6) Yj = 17 2 (b) Using the results of part (a), we obtain

113 CHAP. 31 MULTIPLE RANDOM VARIABLES Find the conditional pmf's pylx(yj I xi) and pxiy(xi I yj) for the bivariate r.v. (X, Y ) of Prob From the results of Prob. 3.15, we have Thus, by Eqs. (3.33) and (3.34), px(xj = +xi2 Xi = 1, 2 payj)=iyj xi = 1, 2; yj = 1, 2, 3 otherwise Yj=1,2,3 Note that PYlAyj 1 xi) = py(yj) and pxl& 1 yj) = px(xi), as must be the case since X and Y are independent, as shown in Prob Consider the bivariate r.v. (X, Y) of Prob (a) Find the conditional pdf's f &(y I x) and fxl,(x 1 y). (b) Find P(0 < Y < ) I X = 1). (a) From the results of Prob. 3.17, we have Thus, by Eqs. (3.38) and (3.39), o<x<2,o<y<2 otherwise fx(x) = t(x + 1) 0 < x < 2 fdy) = b(y + 1) 0 < Y < 2 (b) Using the results of part (a), we obtain Find the conditional pdf's fy lx(y ( x) and fxl,(x ( y) for the bivariate r.v. (X, Y ) of Prob From the results of Prob. 3.18, we have Thus, by Eqs. (3.38) and (3.39), O<x<l,O<y<l otherwise fdx) = 2x 0 < x < 1 fv(y) = 2~ 0 < Y < 1

114 MULTIPLE RANDOM VARIABLES [CHAP 3 Again note that fyldy I x) = fy(y) and fxldx I y) = fx(x), as must be the case since X and Y are independent, as shown in Prob Find the conditional pdf's fylx(y I x) and fxly(x I y) for the bivariate r.v. (X, Y) of Prob From the results of Prob. 3.20, we have i 2 O<ylx<l fxy(x' = 0 otherwise fx(x) = 2x O<x<l fy(y) = 2(1 - Y) 0 < Y < 1 Thus, by Eqs. (3.38) and (3.39), 1 fr,x(y I x) = ; y<x<l,o<x<l The joint pdf of a bivariate r.v. (X, Y) is given by x>o,y>o (a) Show that f,,(x, y) satisfies Eq. (3.26). (b) Find P(X > 1 I Y = y). (a) We have otherwise (b) First we must find the marginal pdf on Y. By Eq. (3.31), By Eq. (3.39), the conditional pdf of X is fxlu(x I Y) = - x>o,y>o otherwise Then

115 CHAP. 31 MULTIPLE RANDOM VARIABLES COVARIANCE AND CORRELATION COEFFICIENTS Let (X, Y) be a bivariate r.v. If X and Y are independent, show that X and Y are uncorrelated. If (X, Y) is a discrete bivariate r.v., then by Eqs. (3.43) and (3.22)' If (X, Y) is a continuous bivariate r.v., then by Eqs. (3.43) and (3.32), Thus, X and Y are uncorrelated by Eq. (3.52) Suppose the joint pmf of a bivariate r.v. (X, Y) is given by (a) Are X and Y independent? (b) Are X and Y uncorrelated? (a) By Eq. (3.20), the marginal pmf's of X are By Eq. (3.21), the marginal pmf's of Y are (b) Thus X and Y are not independent. By Eq.s (3.45a), (3.45b), and (3.43), we have Now by Eq. (3.51),

116 MULTIPLE RANDOM VARIABLES [CHAP 3 Thus, X and Y are uncorrelated Let (X, Y) be a bivariate r.v. with the joint pdf Show that X and Y are not independent but are uncorrelated. By Eq. (3.30), the marginal pdf of X is Noting that the integrand of the first integral in the above expression is the pdf of N(0; 1) and the second integral in the above expression is the variance of N(0; I), we have Since fx,(x, y) is symmetric in x and y, we have I Now fx,(x, y) # fx(x) fu(y), and hence X and Y are not independent. Next, by Eqs. (3.47a) and (3.473), since for each integral the integrand is an odd function. By Eq. (3.43), The integral vanishes because the contributions of the second and the fourth quadrants cancel those of the first and the third. Thus, E(XY) = E(X)E(Y), and so X and Y are uncorrelated Let (X, Y) be a bivariate r.v. Show that [E(xy)125 E(x2)E(y2) This is known as the Cauchy-Schwarz inequality. Consider the expression E[(X - for any two r.v.'s X and Y and a real variable a. This expression, when viewed as a quadratic in a, is greater than or equal to zero; that is, E [(X - a Y)2] 2 0 for any value of a. Expanding this, we obtain E(X2) - 2aE(XY) + a2e(y2) 2 0 Choose a value of a for which the left-hand side of this inequality is minimum,

117 CHAP. 31 MULTIPLE RANDOM VARIABLES which results in the inequality Verify Eq. (3.54). or Then From the Cauchy-Schwarz inequality [Eq. (3.97)], we have {ECG- PX)(Y - PY)I)~ 5 EC(X - PX)~IEC(Y - PY)~I oxy2 I ox20y2 Since pxy is a real number, this implies p Y or -lipxyil Let (X, Y) be the bivariate r.v. of Prob (a) Find the mean and the variance of X. (b) Find the mean and the variance of Y. (c) Find the covariance of X and Y. (d) Find the correlation coefficient of X and Y. (a) From the results of Prob. 3.12, the mean and the variance of X are evaluated as follows: (b) Similarly, the mean and the variance of Y are By Eq. (3.51), the covariance of X and Y is (d) By Eq. (3.53), the correlation coefficient of X and Y is Suppose that a bivariate r.v. (X, Y) is uniformly distributed over a unit circle (Prob. 3.21).

118 MULTIPLE RANDOM VARIABLES [CHAP 3 (a) Are X and Y independent? (b) Are X and Y correlated? (a) Setting R = 1 in the results of Prob. 3.21, we obtain Since fxy(x, y) # fx(x) fy(y); X and Y are not independent. (b) By Eqs. (3.47~) and (3.47b), the means of X and Y are since each integrand is an odd function. Next, by Eq. (3.43), The integral vanishes because the contributions of the second and the fourth quadrants cancel those of the first and the third. Hence, E(XY) = E(X)E(Y) = 0 and X and Y are uncorrelated. CONDITIONAL MEANS AND CONDITIONAL VARIANCES Consider the bivariate r.v. (X, Y) of Prob (or Prob. 3.26). Compute the conditional mean and the conditional variance of Y given xi = 2. From Prob. 3.26, the conditional pmf pyl,(yj I xi) is 2xi + y, PY l ~ I ( xi) ~ = j - yj = 1, 2; xi = 1, 2 4xi + 3 T~US, PY~X(Y~ 12) = 4+yj yj=1,2 and by Eqs. (3.55) and (3.56), the conditional mean and the conditional variance of Y given xi = 2 are 4+yj = E[(Y -{y I xi = 21 = (ij- E) (?)

119 CHAP. 31 MULTIPLE RANDOM VARIABLES Let (X, Y) be the bivariate r.v. of Prob (or Prob. 3.30). Compute the conditional means E(Y I x) and E(X I y). From Prob. 3.30, By Eq. (3.58), the conditional mean of Y, given X = x, is Similarly, the conditional mean of X, given Y = y, is Note that E( Y I x) is a function of x only and E(X I y) is a function of y only Let (X, Y) be the bivariate r.v. of Prob (or Prob. 3.30). Compute the conditional variances Var(Y I x) and Var(X I y). Using the results of Prob and Eq. (3.59), the conditional variance of Y, given X = x, is Var(YIx) = E{CY - E(Y1x)I2 1x1 = Similarly, the conditional variance of X, given Y = y, is N-DIMENSIONAL RANDOM VECTORS Let (X,, X,, X,, X,) be a four-dimensional random vector, where X, (k = 1, 2, 3, 4) are independent Poisson r.v.'s with parameter 2. (a) Find P(X, = 1, X2 = 3,X3 = 2,X4 = 1). (b) Find the probability that exactly one of the X,'s equals zero. (a) By Eq. (2.40), the pmf of X, is Since the Xis are independent, by Eq. (3.80), P(X, = 1, x2 = 3, x3 = 2, x4 = 1) = px,(1)px2(3)~x3(2)~x4(1) (b) First, we find the probability that X, = 0, k = 1, 2,3,4. From Eq. (3.98), ~(x,=o)=e-~ k=1,2,3,4

120 MULTIPLE RANDOM VARIABLES [CHAP 3 Next, we treat zero as "success." If Y denotes the number of successes, then Y is a binomial r.v. with parameters (n, p) = (4, e-2). Thus, the probability that exactly one of the X,'s equals zero is given by [Eq. (2.W Let (X, Y, Z) be a trivariate r.v., where X, Y, and Z are independent uniform r.v.'s over (0, 1). Compute P(Z 2 X Y). Since X, Y, Z are independent and uniformly distributed over (0, I), we have Then = 1 ( I x y ) dy dx = [ (1-2) dx =I Let (X, Y, Z) be a trivariate r.v. with joint pdf fxuz(x, Y, 4 = where a, b, c > 0 and k are constants. (a) Determine the value of k. {;-W+b+W (b) Find the marginal joint pdf of X and Y. (c) Find the marginal pdf of X. (d) Are X, Y, and Z independent? x>o,y>o,z>o otherwise Thus k = abc. (b) By Eq. (3.77), the marginal joint pdf of X and Y is (c) By Eq. (3.78), the marginal pdf of X is (d) Similarly, we obtain

121 CHAP. 31 MULTIPLE RANDOM VARIABLES z > 0 Since fxyz(x, y, z) = fx(x) fy(y) fz(z), X, Y, and Z are independent Show that By definition (3.79), fxuz(x9 Y, 4 'f&, Y(Z I x, Y)~Y,X(Y I x)fx(x) Hence Now, by Eq. (3.38), Substituting this expression into Eq. (3.100), we obtain fxu(x9 Y) = fyix(y I x)fx(x) SPECIAL DISTRIBUTIONS Derive Eq. (3.87). Consider a sequence of n independent multinomial trials. Let Ai (i = 1, 2,..., k) be the outcome of a single trial. The r.v. Xi is equal to the number of times Ai occurs in the n trials. If x,, x,,..., x, are nonnegative integers such that their sum equals n, then for such a sequence the probability that Ai occurs xi times, i = 1, 2,..., k-that is, P(X1 = x,, X, = x,,..., X, = x,tcan be obtained by counting the number of sequences containing exactly x, A,'s, x, A,'s,..., x, A,'s and multiplying by p,x1p2x2... pkxk. The total number of such sequences is given by the number of ways we could lay out in a row n things, of which x, are of one kind, x, are of a second kind,..., x, are of a kth kind. The number of ways we could choose x, positions for the Alls is ) ; after having put the Al's in their position, the number of ways we could choose positions for the A,'s is (n i2x1), and so on, Thus, the total number of sequences with x, A19s, x, A,'s,..., xk Ah's is given by Thus, we obtain ( ( n,xl)( - ;3- x2).*. (n - x1 - x2 - - Xk-1 Xk ) - n! - (n - x,)!... (n-x, -x,--..- xk- I)! x,!(n - x,)! x,!(n - x, - x,)! x,! O! - n! - x1!x2! x,! Suppose that a fair die is rolled seven times. Find the probability that 1 and 2 dots appear twice each; 3,4, and 5 dots once each; and 6 dots not at all. Let (XI, X,,..., X,) be a six-dimensional random vector, where Xi denotes the number of times i dots appear in seven rolls of a fair die. Then (X,, X,,..., X,) is a multinomial r.v. with parameters (7, p,, p,,...,

122 MULTIPLE RANDOM VARIABLES [CHAP 3 ps) where pi = 4 (i = 1,2,...,6). Hence, by Eq. (3.87), Show that the pmf of a multinomial r.v. given by Eq. (3.87) satisfies the condition (3.68); that is,. 1 Pxlx2 -. ~ ~ ( X, ~ , 7 ~ k = ) l (3.101) where the summation is over the set of all nonnegative integers x,, x,,..., xk whose sum is n. The multinomial theorem (which is an extension of the binomial theorem) states that where x, + x, x, = n and is called the multinomial coefficient, and the summation is over the set of all nonnegative integers x,, x,,..., x, whose sum is n. Thus, setting ai = pi in Eq. (3.102), we obtain Let (X, Y) be a bivariate normal r.v. with its pdf given by Eq. (3.88). (a) Find the marginal pdf's of X and Y. (b) Show that X and Y are independent when p = 0. (a) By Eq. (3.30), the marginal pdf of X is From Eqs. (3.88) and (3.89), we have Rewriting q(x, y), Then 1 1 f~x) = exp[ 5 q,(x, i)] d~ where

123 CHAP. 3) MULTIPLE RANDOM VARIABLES 115 Comparing the integrand with Eq. (2.52), we see that the integrand is a normal pdf with mean py + p(oy/ox)(x - px) and variance (1 - p2)oy2. Thus, the integral must be unity and we obtain In a similar manner, the marginal pdf of Y is (b) When p = 0, Eq. (3.88) reduces to Hence, X and Y are independent Show that p in Eq. (3.88) is the correlation coefficient of X and Y. By Eqs. (3.50) and (3.53), the correlation coefficient of X and Y is where fxy(x, y) is given by Eq. (3.88). By making a change in variables v = (x- px)/ax and w = (y- py)/oy, we can write Eq. (3.105) as 1 1 PXY = J:' j;/w 2,1 - exp -- (v2-2pvw + w2) do dw p2)i/2 [ 2u - p2) I The term in the curly braces is identified as the mean of V = N(pw; 1 - p2), and so The last integral is the variance of W = N(0; l), and so it is equal to 1 and we obtain pxy = p Let (X, Y) be a bivariate normal r.v. with its pdf given by Eq. (3.88). Determine E(Y I x). By Eq. (3.58), where Substituting Eqs. (3.88) and (3.103) into Eq. (3.107), and after somecancellation and rearranging, we obtain

124 MULTIPLE RANDOM VARIABLES [CHAP 3 which is equal to the pdf of a normal r.v. with mean py + p(a,/ox)(x -,ux) and variance (1 - p2)ay2. Thus, we get Note that when X and Y are independent, then p = 0 and E(Y I x) = py = E(Y) The joint pdf of a bivariate r.v. (X, Y) is given by (a) Find the means of X and Y. (b) Find the variances of X and Y. (c) Find the correlation coefficient of X and Y. We note that the term in the bracket of the exponential is a quadratic function of x and y, and hence fxy(x, y) could be a pdf of a bivariate normal r.v. If so, then it is simpler to solve equations for the various parameters. Now, the given joint pdf of (X, Y) can be expressed as where Comparing the above expressions with Eqs. (3.88) and (3.89), we see that fxy(x, y) is the pdf of a bivariate normal r.v. with p, = 0, py = 1, and the following equations: Solving for ax2, ay2, and p, we get Hence (a) The mean of X is zero, and the mean of Y is 1. (b) The variance of both X and Y is 2. (c) The correlation coefficient of X and Y is 4. ax2=ay2=2 and p=i Consider a bivariate r.v. (X, Y), where X and Y denote the horizontal and vertical miss distances, respectively, from a target when a bullet is fired. Assume that X and Y are independent and that the probability of the bullet landing on any point of the xy plane depends only on the distance of the point from the target. Show that (X, Y) is a bivariate normal r.v. From the assumption, we have fxu(x7 Y) =fx(x)fu(y) = s(x2 + y2) for some function g. Differentiating Eq. (3.109) with respect to x, we have f i ( ~ ) f Y ( ~ = ) 2xs'(x2 + y2) Dividing Eq. (3.1 10) by Eq. (3.1 09) and rearranging, we get

125 CHAP. 31 MULTIPLE RANDOM VARIABLES Note that the left-hand side of Eq. (3.11 1) depends only on x, whereas the right-hand side depends only on x2 + y2; thus where c is a constant. Rewriting Eq. (3.1 12) as and integrating both sides, we get where a and k are constants. By the properties of a pdf, the constant c must be negative, and setting c = - l/a2, we have Thus, by Eq. (2.52), X = N(0; a2) and In a similar way, we can obtain the pdf of Y as,-~2n202) fx(4 = - fia Since X and Y are independent, the joint pdf of (X, Y) is 1 fxy(x, y) = fx(x) fy(y) = 7 e-(x2+ y2)1(2a2) 2aa which indicates that (X, Y) is a bivariate normal r.v Let (XI, X,,..., X,) be an n-variate normal r.v. with its joint pdf given by Eq. (3.92). Show that if the covariance of Xi and Xj is zero for i # j, that is, then XI, X,,..., X, are independent. From Eq. (3.94) with Eq. (3.1 14), the covariance matrix K becomes It therefore follows that and

126 MULTIPLE RANDOM VARIABLES [CHAP 3 K-l = Then we can write Substituting Eqs. (3.1 16) and (3.1 18) into Eq. (3.92), we obtain Now Eq. (3.1 19) can be rewritten as where Thus we conclude that XI, X,,..., Xn are independent. Supplementary Problems Consider an experiment of tossing a fair coin three times. Let (X, Y) be a bivariate r.v., where X denotes the number of heads on the first two tosses and Y denotes the number of heads on the third toss. (a) Find the range of X. (b) Find the range of Y. (c) Find the range of (X, Y). (d) Find(i)P(X 12, Y I l);(ii)p(x I 1, Y I l);and(iii)p(x 10, Y 10) Let FXy(x, y) be a joint cdf of a bivariate r.v. (X, Y). Show that P(X > a, Y > c) = 1 - Fx(a) - F,(c) + Fx,(a, c) where F,(x) and F,(y) are marginal cdf7s of X and Y, respectively. Hint: Set x, = a, y, = c, and x, = y, = co in Eq. (3.95) and use Eqs. (3.13) and (3.14).

127 CHAP. 31 MULTIPLE RANDOM VARIABLES Let the joint pmf of (X, Y) be given by where k is a constant. Pxy(xi, Yj) = {ri + ~ j) (a) Find the value of k. (b) Find the marginal pmf's of X and Y. Ans. (a) k = & (b) px(xi) = &2xi + 3) xi = 1, 2, 3 xi = 1, 2, 3; yj = 1, 2 otherwise The joint pdf of (X, Y) is given by where k is a constant. (a) Find the value of k. (b) Find P(X > 1, Y < I), P(X < Y), and P(X s 2). x > 0, y > 0 otherwise Ans. (a) k = 2 (b) P(X > 1, Y < 1) = e-' - e-3 z 0.318; P(X < Y) = 4; P(X I 2) = 1 - e-2 z Let (X, Y) be a bivariate r.v., where X is a uniform r.v. over (0, 0.2) and Y is an exponential r.v. with parameter 5, and X and Y are independent. (a) Find the joint pdf of (X, Y). (b) Find P(Y 5 X). Ans. (a) fx&, y) = 0 < x < 0.2, y > 0 otherwise Let the joint pdf of (X, Y) be given by (a) Show that fxdx, y) satisfies Eq. (3.26). (b) Find the marginal pdf's of X and Y. Ans. (b) fx(x) = e-" x>o 1 fr(y) = 0' Y>O The joint pdf of (X, Y) is given by where k is a constant. (a) Find the value of k. (b) Find the marginal pdf's of X and Y. Ans. (a) k = otherwise x<y<2x,o<x<2 otherwise

128 MULTIPLE RANDOM VARIABLES [CHAP 3 (b) fx(x) = &x3(4 - $x) O<x<2 O<y<2 Qy3) 2 < Y < 4 otherwise The joint pdf of (X, Y) is given by (a) Find the marginal pdf's of X and Y. (b) Are X and Y independent? Ans. (a) fx(x) = xe-"'i2 x > 0 fky) = ye-y212 y > 0 (b) Yes The joint pdf of (X, Y) is given by fxyk Y) = (a) Are X and Y independent? (b) Find the conditional pdf's of X and Y. Ans. (a) Yes x>o, y>o otherwise The joint pdf of (X, Y) is given by (a) Find the conditional pdf's of Y, given that X = x. (b) Find the conditional cdf's of Y, given that X = x. Ans. (a) fylx(y 1 x) = ex-y y2x otherwise Consider the bivariate r.v. (X, Y) of Prob (a) Find the mean and the variance of X. (b) Find the mean and the variance of Y. (c) Find the covariance of X and Y. (d) Find the correlation coefficient of X and Y. Ans. (a) E(X) = g, Var(X) = (b) E(Y)= y,var(y)= 3 (c) Cov(X, Y) = -A (d) p = Consider a bivariate r.v. (X, Y) with joint pdf Find P[(X, Y) I x2 + y2 I a2].

129 CHAP. 3) MULTIPLE RANDOM VARIABLES Ans. 1 - e-"2/(2@2) Let (X, Y) be a bivariate normal r.v., where X and Y each have zero mean and variance a2, an( f the correlation coefficient of X and Y is p. Find the joint pdf of (X, Y) The joint pdf of a bivariate r.v. (X, Y) is given by (a) Find the means and variances of X and Y. (b) Find the correlation coefficient of X and Y. Ans. (a) p, = p, = 0 ax2 = ay2 = 1 (b) P = Let (X, Y, 2) be a trivariate r.v., where X, Y, and Z are independent and each has a uniform distribution over (0, 1). Compute P(X 2 Y 2 Z). Ans. 3

130 Chapter 4 Functions of Random Variables, Expectation, Limit Theorems 4.1 INTRODUCTION In this chapter we study a few basic concepts of functions of random variables and investigate the expected value of a certain function of a random variable. The techniques of moment generating functions and characteristic functions, which are very useful in some applications, are presented. Finally, the laws of large numbers and the central limit theorem, which is one of the most remarkable results in probability theory, are discussed. 4.2 FUNCTIONS OF ONE RANDOM VARIABLE A. Random Variable g(x) : Given a r.v. X and a function g(x), the expression defines a new r.v. Y. With y a given number, we denote Dy the subset of Rx (range of X) such that g(x) s y. Then where (X E Dy) is the event consisting of all outcomes [ such that the point X([) E Dy. Hence If X is a continuous r.v. with pdf f,(x), then B. Determination of fyq) from fx(x): Let X be a continuous r.v. with pdf f,(x). If the transformation y = g(x) is one-to-one and has the inverse transformation then the pdf of Y is given by (Prob. 4.2) x = g-l(y) = NY) Note that if g(x) is a continuous monotonic increasing or decreasing function, then the transformation y = g(x) is one-to-one. If the transformation y = g(x) is not one-to-one, fb) is obtained as follows: Denoting the real roots of y = g(x) by x,, that is,

131 CHAP. 41 FUNCTIONS OF RANDOM VARIABLES, EXPECTATION, LIMIT THEOREMS 123 then where g'(x) is the derivative of g(x) FUNCTIONS OF TWO RANDOM VARIABLES A. One Function of Two Random Variables: Given two r.v.'s X and Y and a function g(x, y), the expression defines a new r.v. 2. With z a given number, we denote Dz the subset of Rxy [range of (X, Y)] such that g(x, y) 5 z. Then where ((X, Y) E DZ) is the event consisting of all outcomes Hence such that the point {X([), Y(5)) E DZ. FAz) = P(Z I z) = P[g(X, Y) 5 z] = P((X, Y) E D,} (4.1 1) If X and Y are continuous r.v.'s with joint pdffxax, y), then B. Two Functions of Two Random Variables: Given two r.v.'s X and Y and two functions g(x, y) and h(x, y), the expression defines two new r.v.'s Z and W. With z and w two given numbers, we denote Dzw the subset of Rxy [range of (X, Y)] such that g(x, y) < z and h(x, y) 5 w. Then where ((X, Y) E Dzw} is the event consisting of all outcomes Hence such that the point {X(c), Y([)} E DZw. In the continuous case, we have Determination of fz iw(z, w) jiom fdx, y) : Let X and Y be two continuous r.v.'s with joint pdf fxy(x, y). If the transformation z = g(x, Y) w = h(x, Y) is one-to-one and has the inverse transformation

132 124 FUNCTIONS OF RANDOM VARIABLES, EXPECTATION, LIMIT THEOREMS [CHAP. 4 then the joint pdf of Z and W is given by where x = q(z, w), y = r(z, w), and fzw(z, W) = f X Y k Y) I J(x, Y) I - which is the jacobian of the transformation (4.1 7). If we define and Eq. (4.1 9) can be expressed as 4.4 FUNCTIONS OF n RANDOM VARIABLES A. One Function of n Random Variables: Given n r.v.'s X,,..., X, and a function g(x,,..., x,), the expression y = g(x1,..., X,) defines a new r.v. Y. Then and where Dy is the subset of the range of (X,,..., X,) such that g(xl,..., x,) < y. If XI,..., X, are continuous r.v.'s with joint pdf f,,...,ax..., xn), then B. n Functions of n Random Variables: When the joint pdf of n r.v.'s X,,..., X, is given and we want to determine the joint pdf of n r.v.'s Y,,..., Y,, where Yl = gl(x1, - 7 Xn) K = gn(x1, Xn) the approach is the same as for two r.v.'s. We shall assume that the transformation (4.28)

133 CHAP. 41 FUNCTIONS OF RANDOM VARIABLES, EXPECTATION, LIMIT THEOREMS 125 where which is the jacobian of the transformation (4.29). 4.5 EXPECTATION A. Expectation of a Function of One Random Variable: The expectation of Y = g(x) is given by (1 g(x) fx(x) dx (continuous case) -00 B. Expectation of a Function of More than One Random Variable: Let XI,..., Xn be n r.v.'s, and let Y = g(x,,..., Xn).Then (discrete case) (continuous case) (4.34) C. Linearity Property of Expectation: Note that the expectation operation is linear (Prob. 4.39), and we have where a,'s are constants. If r.v.'s X and Y are independent, then we have (Prob. 4.41) ECg(X)h( Y)I = ECg(X)IE[h( Y)I (4.36) The relation (4.36) can be generalized to a mutually independent set of n r.v.'s XI,..., X,: ~[fi gi(xi)] = i fi = 1 ECg/Xi)I (4.37)

134 126 FUNCTIONS OF RANDOM VARIABLES, EXPECTATION, LIMIT THEOREMS [CHAP. 4 D. Conditional Expectation as a Random Variable: In Sec. 3.8 we defined the conditional expectation of Y given X = x, E(Y 1 x) [Eq. (3.58)], which is, in general, a function of x, say H(x). Now H(X) is a function of the r.v. X; that is, H(X) = E(Y 1 X) (4.38) Thus, E(Y 1 X) is a function of the r.v. X. Note that E(Y I X) has the following property (Prob. 4.38): ECV I XI1 = E(Y) (4.39) 4.6 MOMENT GENERATING FUNCTIONS A. Definition: The moment generating function of a r.v. X is defined by (1 etxi~x(xi) (discrete case) Mx(t) = ~(e") = ii2yx(x, d~ (continuous case) where t is a real variable. Note that Mx(t) may not exist for all r.v.'s X. In general, M,(t) will exist only for those values of t for which the sum or integral of Eq. (4.40) converges absolutely. Suppose that Mdt) exists. If we express etx formally and take expectation, then and the kth moment of X is given by where mk = E(Xk) = Mx(k)(0) k = 1, 2,... B. Joint Moment Generating Function : The joint moment generating function Mxy(tl, t,) of two r.v.'s X and Y is defined by Mxr(tl, t2) = E[e(t1X+t2Y) 1 (4.44) where t1 and t2 are real variables. Proceeding as we did in Eq. (4.41), we can establish that and the (k, n) joint moment of X and Y is given by where In a similar fashion, we can define the joint moment generating function of n r.v.'s XI,..., X, by Mxl... x,(tl,..., t,) = E[e(tlX1+"'+hXn)] (4.48)

135 CHAP. 41 FUNCTIONS OF RANDOM VARIABLES, EXPECTATION, LIMIT THEOREMS 127 from which the various moments can be computed. If X,,..., X, are independent, then C. Lemmas for Moment Generating Functions: Two important lemmas concerning moment generating functions are stated in the following: Lemma 4.1: If two r.v.'s have the same moment generating functions, then they must have the same distribution. Lemma 4.2: Given cdfs F(x), Fl(x), F,(x),... with corresponding moment generating functions M(t), M,(t), M,(t),..., then Fn(x) -, F(x) if M,(t) -+ M(t). 4.7 CHARACTERISTIC FUNCTIONS A. Definition: The characteristic function of a r.v. X is defined by (x Brnxipx(xi) (discrete case) [J ejiwx fx(x) dx -a, (continuous case) where o is a real variable and j = p. Note that Yx(w) is obtained by replacing t in Mx(t) by jw if Mx(t) exists. Thus, the characteristic function has all the properties of the moment generating function. Now for the discrete case and for the continuous case. Thus, the characteristic function Yx(u) is always defined even if the moment function Mx(t) is not (Prob. 4.58). Note that Yx(w) of Eq. (4.50) for the continuous case is the Fourier transform (with the sign of j reversed) of fx(x). Because of this fact, if Yx(w) is known, fx(x) can be found from the inverse Fourier transform; that is, B. Joint Characteristic Functions: The joint characteristic function Yx,(w,, w,) of two r.v.'s X and Y is defined by '1 ej(~lxi+~2~k) PXY(~~, ~ k ) (discrete case) k i ej(a~x C" C" where w, and cu, are real variables. +WZY) fxy(x, y) dx dy (continuous case)

136 FUNCTIONS OF RANDOM VARIABLES, EXPECTATION, LIMIT THEOREMS [CHAP. 4 The expression of Eq. (4.52) for the continuous case is recognized as the two-dimensional Fourier transform (with the sign of j reversed) of fxy(x, y). Thus, from the inverse Fourier transform, we have From Eqs. (4.50) and (4.52), we see that which are called marginal characteristic functions. Similarly, we can define the joint characteristic function of n r.v.'s XI,..., X,! by As in the case of the moment generating function, if X,,..., X,, are independent, then C. Lemmas for Characteristic Functions: As with the moment generating function, we have the following two lemmas: Lemma 4.3: A distribution function is uniquely determined by its characteristic function. Lemma 4.4: Given cdfs F(x), F,(x), F,(x),... with corresponding characteristic functions Y(o), Y1(o), Y,(o),..., then Fn(x) + F(x) at points of continuity of F(x) if and only if Y,(o) -+ Y(o) for every o. 4.8 THE LAWS OF LARGE NUMBERS AND THE CENTRAL LIMIT THEOREM A. The Weak Law of Large Numbers: Let XI,..., X, be a sequence of independent, identically distributed r.v.'s each with a finite mean E(Xi) = p. Let Then, for any E > 0, Equation (4.58) is known as the weak law of large numbers, and X, is known as the sample mean. B. The Strong Law of Large Numbers: Let XI,..., X, be a sequence of independent, identically distributed r.v.'s each with a finite mean E(Xi) = p. Then, for any E > 0, where Xn is the sample mean defined by Eq. (4.57). Equation (4.59) is known as the strong law of large numbers.

137 CHAP. 4) FUNCTIONS OF RANDOM VARIABLES, EXPECTATION, LIMIT THEOREMS 129 Notice the important difference between Eqs. (4.58) 6nd (,4.59). Equation (4.58) tells us how a sequence of probabilities converges, and Eq. (4.59) tells us how the sequence of r.v,'s behaves in the limit. The strong law of large numbers tells us that the sequence (xn) is converging to the contant p. C. The Central Limit Theorem: The central limit theorem is one of the most remarkable results in probability theory. There are many versions of this theorem. In its simplest form, the central limit theorem is stated as follows: Let XI,..., X, be a sequence of independent, identically distributed r.v.'s each with mean p and variance a'. Let where 1, is defined by Eq. (4.57). Then the distribution of 2, tends to the standard normal as n -, oo; that is, lim 2, = N(0; 1) n-t m lim F,,(Z) = lim P(Z, I z) fl-' aj fl* W is the cdf of a standard normal r.v. [Eq. (2.54)]. Thus, the central limit theorem tells us that for large n, the distribution of the sum Sn = XI Xn is approximately normal regardless of the form of the distribution of the individual X,'s. Notice how much stronger this theorem is than the laws of large numbers. In practice, whenever an observed r.v. is known to be a sum of a large number of r.v.3, then the central limit theorem gives us some justification for assuming that this sum is normally distributed. Solved Problems FUNCTIONS OF ONE RANDOM VARIABLE 4.1. If X is N(p; a2), then show that Z = (X- p)/a is a standard normal r.v.; that is, N(0; 1). The cdf of Z is By the change of variable y = (x - p)/a (that is, x = ay + p), we obtain

138 FUNCTIONS OF RANDOM VARIABLES, EXPECTATION, LIMIT THEOREMS [CHAP. 4 and which indicates that Z = N(0; 1) Verify Eq. (4.6). Assume that y = g(x) is a continuous monotonically increasing function [Fig. 4-l(a)]. Since y = g(x) is monotonically increasing, it has an inverse that we denote by x = g-'(y) = h(y). Then Applying the chain rule of differentiation to this expression yields which can be written as If y = g(x) is monotonically decreasing [Fig. 4.l(b)], then Thus, Fdy) = P( Y I y) = P[X > h(y)] = 1 - Fx[h(y)] d dx ~Y(Y) = - FAY) = -fx(x) - x = h(y) (4.66) dy dy In Eq. (4.66), since y = g(x) is monotonically decreasing, dy/dx (and dxldy) is negative. Combining Eqs. (4.64) and (4.66), we obtain which is valid for any continuous monotonic (increasing or decreasing) function y = g(x) Let X be a r.v. with cdf F,(x) and pdf f,(x). Let Y =. ax + b, where a and b are real constants and a # 0. (a) Find the cdf of Y in terms of F,(x). Fig. 4-1

139 CHAP. 4) FUNCTIONS OF RANDOM VARIABLES, EXPECTATION, LIMIT THEOREMS 131 (b) (a) Find the pdf of Y in terms of fx(x). (a) If a > 0, then [Fig. 4-2(a)] If a < 0, then [Fig. 4-2(b)] Fig. 4-2 (b) f &) = P(Y 6 y) = P(aX + b i y) = P(X 5 q) = Fx($) F&) = P(Y S y) = P(aX + b 6 y) = P(aX I y - b) = P ( X z e ) -1-P(X<?) = 1 - P (X.5) + P(X +) = 1 - F x ( e ) + f'(x = E$) (since a < 0, note the change in the inequality sign) Note that if X is continuous, then P[X = (y - b)/a] = 0, and (b) From Fig. 4-2, we see that y = g(x) = ax + b is a contirluous monotonically increasing (a > 0) or decreasing (a < 0) function. Its inverse is x = g-'(y) = h(y) = (y - b)/a, and dx/dy = l/a. Thus, by Eq. (4.61, Note that Eq. (4.70) can also be obtained by differentiating Eqs. (4.67) and (4.69) with respect to y Let Y = ax + b. Determine the pdf of Y, if X is a uniform r.v. over (0, 1). The pdf of X is [Eq. (2.44)] Then by Eq. (4.70), we get 1 O<x<l fx(x) = {O otherwise I 1 YERY otherwise

140 FUNCTIONS OF RANDOM VARIABLES, EXPECTATION, LIMlT THEOREMS [CHAP. 4 Fig. 4-3 The range Ry is found as follows: From Fig. 4-3, we see that For a > 0: R,={y:b<y<a+b) For a < 0: R,={y:a+b<y<b) 4.5. Let Y = ax + b. Show that if X = N(p; a2), then Y = N(ap + b; a202), and find the values of a and b so that Y = N(0; 1). Since X = N(p; a2), by Eq. (2.52), Hence, by Eq. (4.70), 1 1 fx(4 = ---,pi. which is the pdf of N(ap + b; a2a2). Hence, Y = N(ap + b; a2a2). Next, let ap + b = 0 and a2a2 = 1, from which we get a = 110 and b = - pla. Thus, Y = (X - p)/a is N(0; 1) (see Prob. 4.1) Let X be a r.v. with pdf f,(x). Let Y = x2. Find the pdf of Y. The event A = (Y 5 y) in R, is equivalent to the event B = (-& 5 X 5 &) in Rx (Fig. 4-4). If y 10, then and Thus,

141 CHAP. 4) FUNCTIONS OF RANDOM VARIABLES, EXPECTATION, LIMIT THEOREMS 133 Alternative Solution : Fig. 4-4 If y c 0, then the equation y = x2 has no real solutions; hence f,(y) = 0. If y > 0, then y = x2 has two solutions, x, = fi and x, = -A. Now, y = g(x) = x2 and gf(x) = 2x. Hence, by Eq. (4.8), 4.7. Let Y = x2. Find the pdf of Y if X = N(0; 1). Since X = N(0; 1) Since f,(x) is an even function, by Eq. (4.74), we obtain 4.8. Let Y = X2. Find and sketch the pdf of Y if X is a unifonn rev. over (- 1, 2). The pdf of X is [Eq. ( [Fig. 4-5(a)] = {a -l<x<2 otherwise In this case, the range of Y is (0, 4), and we must be careful in applying Eq. (4.74). When 0 < y c 1, both & and -& are in Rx = (- 1, 2), and by Eq. (4.74), When I < I. c 4, & is in R, = (- 1, 2) but -A < - 1, and by Eq. (4.74), Thus, which is sketched in Fig. 4-5(b). 3J; O<y<l l<y<4 otherwise

142 FUNCTIONS OF RANDOM VARIABLES, EXPECTATION, LIMIT THEOREMS [CHAP Let Y = ex. Find the pdf of Y if X is a uniform r.v. over (0, 1). The pdf of X is The cdf of Y is 1 O<x<l 0 otherwise d d 1 Thus, fu(y)=-fay)=-lny=- 1 < y <e dy dy Y Alternative Solution : The function y = g(x) = ex is a continuous monotonically increasing function. Its inverse is x = g-'q = h(y) = In y. Thus, by Eq. (4.6), we obtain t (0 otherwise Let Y = ex. Find the pdf of Y if X = N(p ; a2). The pdf of X is [Eq. (2.52)] Thus, using the technique shown in the alternative solution of Prob. 4.9, we obtain Note that X = In Y is the normal r.v.; hence, the r.v. Y is called the log-normal r.v.

143 CHAP. 4) FUNCTIONS OF RANDOM VARIABLES, EXPECTATION, LIMlT THEOREMS Let Y = tan X. Find the pdf of Y if X is a uniform r.v. over (-7112, 71/2). The cdf of X is [Eq. ( x s -. z/2 (X + n/2) -n/2 < x < 42 x 2 42 Now Fy(y) = P(Y I y) = P(tan X 5 y) = P(X I tan- ' y) -a < y< co Then the pdf of Y is given by d 1 fy(y)=-fy(y)=dy 41 + Y*) Note that the r.v. Y is a Cauchy r.v. with parameter 1. -W<Y<W Let X be a continuous r.v. with the cdf FX(x). Let Y = Fx(X). Show that Y is a uniform r.v. over (0, 1). Notice from the properties of a cdf that y = FX(x) is a monotonically nondecreasing function. Since 0 I FX(x) I 1 for all real x, y takes on values only on the interval (0, 1). Using Eq. (4.64) (Prob. 4.2), we have Hence, Y is a uniform r.v. over (0, 1) Let Y be a uniform r.v. over (0, 1). Let F(x) be a function which has the properties of the cdf of a continuous r.v. with F(a) = 0, F(b) = 1, and F(x) strictly increasing for a < x < b, where a and b could be - co and oo, respectively. Let X = F-'(Y). Show that the cdf of X is F(x). FX(x) = P(X 5 X) = P[F-'(Y) 5 X] Since F(x) is strictly increasing, F-'(Y) 5 x is equivalent to Y I: F(x), and hence Now Y is a uniform r.v. over (0, l), and by Eq. (2.45), FX(x) = P(X I X) = P[Y S F(x)] FAy)=P(YIy)=y O<y<l and accordingly, Fx(x) = P(X 5 x) = PLY 5 F(x)] = F(x) 0 < F(x) < 1 Note that this problem is the converse of Prob

144 136 FUNCTIONS OF RANDOM VARIABLES, EXPECTATION, LIMIT THEOREMS [CHAP Let X be a continuous r.v. with the pdf Find the transformation Y = g(x) such that the pdf of Y is O<y<l The cdf of X is otherwise Then from the result of Prob. 4.12, the r.v. Z = 1 - e-' cdf of Y is is uniformly distributed over (0, 1). Similarly, the otherwise and the r.v. W = fi is uniformly distributed over (0, 1). Thus, by setting Z = W, the required transformation is Y = (1 - e-x)2. FUNCTIONS OF TWO RANDOM VARIABLES Consider Z = X + Y. Show that if X and Y are independent Poisson r.v.'s with parameters A, and A,, respectively, then Z is also a Poisson r.v. with parameter A, + A,. We can write the event where events (X = i, Y = n - i), i = 0, 1,..., n, are disjoint. Since X and Y are independent, by Eqs. (1.46) and (2.40), we have n i n Ali - i n ~ i ~ n - i - C e-h - ---A2 2 - e-(h+h2' i=o i! (n- E L i)! i=o z! (n- i)! e-(ai+n~) " n! =- iin-i n! C- i=,, t!(n - i)! which indicates that Z = X + Y is a Poisson r.v. with A, + 1, Consider two r.v.'s X and Y with joint pdf f,,(x, y). Let Z = X + Y. (a) Determine the pdf of Z. (b) Determine the pdf of Z if X and Y are independent.

145 CHAP. 41 FUNCTIONS OF RANDOM VARIABLES, EXPECTATION, LIMIT THEOREMS 137 Fig. 4-6 (a) The range Rz of Z corresponding to the event (Z I z) = (X + Y I z) is the set of points (x, y) which lie on and to the left of the line z = x + y (Fig. 4-6). Thus, we have Then (b) If X and Y are independent, then Eq. (4.79) reduces to = J=Im fx., 2 - x) dx The integral on the right-hand side of Eq. (4.80~) is known as a convolution of f,(z) and fdz). Since the convolution is commutative, Eq. (4.80~) can also be written as Using Eqs. (4.1 9) and (3.30), redo Prob. 4.16(a); that is, find the pdf of Z = X + Y. Let Z = X + Y and W = X. The transformation z = x + y, w = x has the inverse transformation x=w,y=z-w,and By Eq. (4.19), we obtain Hence, by Eq. (3.30), we get Suppose that X and Y are independent standard normal r.v.'s. Find the pdf of Z = X + Y.

146 138 FUNCTIONS OF RANDOM VARIABLES, EXPECTATION, LIMIT THEOREMS [CHAP. 4 The pdf s of X and Y are Then, by Eq. (4.80a), we have Now, z2-2zx + 2x2 = ( a x - z/\/i)~ + z2/2, and we have with the change of variables u = fix - z/& Since the integrand is the pdf of N(0; I), the integral is equal to unity, and we get 1 fi(4 = e-zw && &d -z3/2(&2 which is the pdf of N(0; 2). Thus, Z is a normal r.v. with zero mean and variance Let X and Y be independent uniform r.v.'s over (0, 1). Find and sketch the pdf of Z = X + Y. Since X and Y are independent, we have The range of Z is (0, 2), and Fz(z) = P(X + Y I z) = If 0 < z < 1 [Fig. 4-7(a)], and If 1 < z < 2 [Fig. 4-7(b)], x+ysz z Fdz) = dx dy = shaded area = -, 2 d fz(4 = FZ(4 = z (2 z )~ Fz(z) = 55 dx dy = shaded area = and Hence, O<z<l otherwise

147 CHAP. 41 FUNCTIONS OF RANDOM VARIABLES, EXPECTATION, LIMIT THEOREMS 139 Fig. 4-7 which is sketched in Fig. 4-7(c). Note that the same result can be obtained by the convolution of faz) and fd Let X and Y be independent gamma r.v.'s with respective parameters (a, A) and (j?, A). Show that Z = X + Y is also a gamma r.v. with parameters (a + #3, A). From Eq. (2.76) (Prob. 2.24), The range of Z is (0, co), and using Eq. (4.80a), we have By the change of variable w = x/z, we have

148 FUNCTIONS OF RANDOM VARIABLES, EXPECTATION, LIMIT THEOREMS [CHAP. 4 where k is a constant which does not depend on z. The value of k is determined as follows: Using Eq. (2.22) and definition (2.77) of the gamma function, we have Hence, k = la+b/t(a + B) and which indicates that Z is a gamma r.v. with parameters (a + /I, A) Consider two r.v.'s X and Y with joint pdff,,(x, y). Determine the pdf of Z = XY. Let Z = XY and W = X. The transformation z = xy, w = x has the inverse transformation x = w, y = z/w, and Thus, by Eq. (4.23), we obtain and the marginal pdf of Z is Let X and Y be independent uniform r.v.3 over (0, 1). Find the pdf of Z = XY. We have The range of Z is (0, 1). Then or By Eq. (4.82), 1 O<x<l,O<y<l fx'x' = {O otherwise 1 O<w<1,0<z/w<1 hr(l :) = { 0 otherwise fxr(w3 3 = { 1 O<z<w<l 0 otherwise

149 CHAP. 41 FUNCTIONS OF RANDOM VARIABLES, EXPECTATION, LIMIT THEOREMS 141 Thus, fz(z)={;'"' ""'I otherwise Consider two r.v.'s X and Y with joint pdf fxy(x, y). Determine the pdf of Z = X/Y. Let Z -- X/Y and W = Y. The transformation z = x/y, w = y has the inverse transformation x = zw, y = w, and Thus, by Eq. (4.23), we obtain and the marginal pdf of Z is -.fzw(z, w) = I I fxy(z-9 w) Let X and Y be independent standard normal r.v.'s. Find the pdf of Z = X/Y. Since X and Y are independent, using Eq. (4.84), we have which is the pdf of a Cauchy r.v. with parameter 1. X and Y be two r.v.'s with joint pdf fxy(x, y) and joint cdf FXy(x, y). Let Z = max(x, Y). Find the cdf of 2. Find the pdf of Z if X and Y are independent. The region in the xy plane corresponding to the event (max(x, Y) 5 zj is shown as the shaded area in Fig Then If X and Y are independent, then F,(z) = P(Z I Z) = P(X 5 Z, Y 5 Z) = FXy(z, Z) (4.85) and differentiating with respect to z gives F&) = Fx(z)Fy(z) fz(4 = fx(z)fr(z) + Fx(z)fr(z)

150 142 FUNCTIONS OF RANDOM VARIABLES, EXPECTATION, LIMIT THEOREMS [CHAP. 4 Fig Let X and Y be two r.v.3 with joint pdf f,ax, y) and joint cdf F&, y). Let W = min(x, Y). (a) Find the cdf of W. (b) Find the pdf of W if X and Y are independent. (a) The region in the xy plane corresponding to the event {min(x, Y) I w) is shown as the shaded area in Fig Then P(W I w) = P((X I w) u (Y I w)} = P(X 5 w) + P(Y I w) - P((X < w) n (Y 5 w)) (b) If X and Y are independent, then and differentiating with respect to w gives Fig Let X and Y be two r.v.'s with joint pdf f,& y). Let Find f,,(r, 0) in terms off,&, y).

151 CHAP. 41 FUNCTIONS OF RANDOM VARIABLES, EXPECTATION, LIMIT THEOREMS 143 We assume that r 2 0 and 0 < 8 < 2x. With this assumption, the transformation has the inverse transformation Since x=rcos8 y=rsin0 by Eq. (4.23) we obtain fre(r, 8) = rfxu(r cos 0, r sin 8) (4.90) A voltage V is a function of time t and is given by V(t) = X cos ot + Y sin ot in which o is a constant angular frequency and X = Y = N(0; a2) and they are independent. (a) Show that V(t)may be written as V(t) = R cos (ot - 0) (4.92) (b) Find the pdfs of r.v.'s R and O and show that R and O are independent. (a) We have where V(t) = X cos wt + Y sin wt Y cut + sin at, X2+ Y ) =,/n(cos 0 cos at + sin 0 sin at) = (,& cos = R cos(at - 0 ) R=JW' and X (b) which is the transformation (4.89). Since X = Y = N(0; a2) and they are independent, we have -(xz + yz)/(2az) fxh Y) = 2no' e Thus, using Eq. (4.90), we get Now and fa&, 8) = fr(r) fe(8); hence, R and 0 are independent. Note that R is a Rayleigh r.v. (Prob. 2.23), and 0 is a uniform r.v. over (0, 271.).

152 144 FUNCTIONS OF RANDOM VARIABLES, EXPECTATION, LIMIT THEOREMS [CHAP. 4 FUNCTIONS OF N RANDOM VARIABLES Let X, Y, and Z be independent standard normal r.v.'s. Let W = (X2 + Y2 + Z2)lI2. Find the pdf of W. We have and F,(W)= P(W I w) = P(X2 + Y2 + z2 < w2) where Rw = ((x, y, z): x2 + y2 + z2 I w2). Using spherical coordinates (Fig. 4-10), we have dx dy dz = r2 sin 8 dr d8 d q and e-r2/2 r 2 sin 8 dr db dq - -& 5,'=dm sin 8 d8 [e-.'12r2 dr Thus, the pdf of W is Fig Spherical coordinates Let XI,..., X, be n independent r.v.'s each with the identical pdf f (x). Let Z = max(x,,..., X,). Find the pdf of 2. The probability P(z < Z < z + dz) is equal to the probability that one of the r.v.'s falls in (z, z + dz) and all others are less than z. The probability that one of Xi (i = 1,..., n) falls in (z, z + dz) and all others

153 CHAP. 41 FUNCTIONS OF RANDOM VARIABLES, EXPECTATION, LIMIT THEOREMS 145 are all less than z is Since there are n ways of choosing the variables to be maximum, we have When n = 2, Eq. (4.98) reduces to which is the same as Eq. (4.86) (Prob. 4.25) with fx(z) = fy(z) = f (2) and FX(z) = FY(z) = F(z) Let XI,..., X, be n independent r.v.'s each with the identical pdf f (x). Let W = min(xl,..., X,). Find the pdf of W. The probability P(w < W < w + dw) is equal to the probability that one of the r.v.3 falls in (w, w + dw) and all others are greater than w. The probability that one of Xi (i = 1,..., n) falls in (w, w + dw) and all others are greater than w is Since there are n ways of choosing the variables to be minimum, we have When n = 2, Eq. (4.1 00) reduces to which is the same as Eq. (4.88) (Prob. 4.26) with f,(w) = f,(w) =f (w) and F,(w) = F,(w) = F(w). Let Xi, i = 1,..., n, be n independent gamma r.v.'s with respective parameters (ai, A), i = 1,..., n. Let Show that Y is also a gamma r.v. with parameters (C=, ai, 1). We prove this proposition by induction. Let us assume that the proposition is true for n = k; that is, k is a gamma r.v. with parameters (8, A) = ( C ai, A). i= 1 Let k+ 1 W=Z+Xk+l = 2 Xi i= 1 Then, by the result of Prob. 4.20, we see that W is a gamma r.v. with parameters (/3 + a,+,, A) = (Elf,' a,, 1). Hence, the proposition is true for n = k + 1. Next, by the result of Prob. 4.20, the proposition is true for n = 2. Thus, we conclude that the proposition is true for any n 2 2.

154 FUNCTIONS OF RANDOM VARIABLES, EXPECTATION, LIMIT THEOREMS [CHAP. 4 Let XI,..., X, be n independent exponential r.v.'s each with parameter A. Let Show that Y is a gamma r.v. with parameters (n, A). We note that an exponential r.v. with parameter 1 is a gamma r.v. with parameters (1, 1) (Prob. 2.24). Thus, from the result of Prob and setting ai = 1, we conclude that Y is a gamma r.v. with parameters (n, 4. Let Z,,..., Z, be n independent standard normal r.v.'s. Let Find the pdf of Y. Let Y, = Zi2. Then by Eq. (4.75) (Prob. 4.7), the pdf of is i= 1 Now, using Eq. (2.80), we can rewrite and we recognize the above as the pdf of a gamma r.v. with parameters (4, 4) [Eq. (2.76)]. Thus, by the result of Prob. 4.32, we conclude that Y is the gamma r.v. with parameters (n/2, 3) and When n is an even integer, T(n/2) = [(n/2) - I]!, whereas when n is odd, T(n/2) can be obtained from T(a) = (a - l)r(a - 1) [Eq. (2.78)] and r(4) = fi [Eq. (2.80)]. Note that Equation (4.102) is referred to as the chi-square (x2) density function with n degrees of freedom, and Y is known as the chi-square (X2) r.v. with n degrees of freedom. It is important to recognize that the sum of the squares of n independent standard normal r.v.3 is a chi-square r.v. with n degrees of freedom. The chi-square distribution plays an important role in statistical analysis. Let XI, X,, and X, be independent standard normal r.v.'s. Let Y, = XI + X, + X3 Y2 = X, - X, Y, = X2 - X3 Determine the joint pdf of Y,, Y2, and Y3. Let y, = x, + x, + x3 Y2 = Xl - x2 Y3 = x2 - x3 By Eq. (4.32), the jacobian of transformation (4.103) is

155 CHAP. 41 FUNCTIONS OF RANDOM VARIABLES, EXPECTATION, LIMIT THEOREMS 147 Thus, solving the system (4.1 O3), we get Then by Eq. (4.31), we obtain Xi = ~ (YI +2Y2 + ~ 3) x2 = 3 ~ -Y2 1 +Y3) x3 = 4(Y, ~3) Since XI, X2, and X3 are independent, Hence, ~Y~Y~Y,(Y~, ~ 2 ~3), = 1 where EXPECTATION 436. Let X be a uniform r.v. over (0, 1) and Y = ex. (a) Find E(Y) by using f,(y). (b) Find E(Y) by using f,(x). (a) From Eq. (4.76) (Prob. 4.9), (0 otherwise Hence, (b) The pdf of X is Then, by Eq. (4.33), 1 O<x<l fx(x) = {O otherwise Let Y = a x + b, where a and b are constants. Show that We verify for the continuous case. The proof for the discrete case is similar.

156 FUNCTIONS OF RANDOM VARIABLES, EXPEC'TATION, LIMIT THEOREMS [CHAP. 4 (b) Using.Eq. (4.1 Oj), we have Verify Eq. (4.39). Using Eqs. (3.58) and (3.38), we have = J:* fxr(x, Y) L U y ZT MX) dx dl. = J:Y[J:fxyix, - Y) dx] dy LetZ=aX+ by,whereaandbareconstants.show that We verify for the continuous case. The proof for the discrete case is similar. Note that Eq. (4.107) (the linearity of E) can be easily extended to n r.v.'s: Let Y = ax + b. Find the covariance of X and Y. Find the correlation coefficient of X and Y. By Eq. (4.107), we have Thus, the covariance of X and Y is [Eq. ( E(XY) = E[X(aX + h)] = ae(x2) + be(x) E(Y) = E(aX + h) = ae(x) + b By Eq. (4.106), we have a, = / a I a,. Thus, the correlation coefficient of X and Y is [Eq. (3.53)J

157 CHAP. 41 FUNCTIONS OF RANDOM VARIABLES, EXPECTATION, LIMIT THEOREMS Verify Eq. (4.36). Since X and Y are independent, we have The proof for the discrete case is similar Let X and Y be defined by X = cos 0 Y = sin O where O is a random variable uniformly distributed over (0, 271). (a) Show that X and Y are uncorrelated. (b) Show that X and Y are not independent. (a) We have P (0 otherwise Then E(X) = I:m~fx(x) di = l* cos Ofe(0) d0 = Similarly, 1 E(XY) = i; Thus, by Eq. (3.52), X and Y are uncorrelated. E(Y)=~ $ sin 0 d0=0 cos 0 sin 0 d0 = - sin 20 d0 = 0 = E(X)E(Y) ax px sin2 0 d0 = ['(I - cos 20) d0 = - 2 E(Y2) = Z; In Hence E(x'Y') = & % cos2 0 sin = - 1 ( 1 - cos 4, d0 = - 8 If X and Y were independent, then by Eq. (4.36), we would have E(X2Y2) = E(X2)E(Y2). Therefore, X and Y are not independent Let XI,..., Xn be n r.v.3. Show that n Var xaixi = x xaiajcov(xi, Xj) (i11 ) i=1 n j=1

158 FUNCTIONS OF RANDOM VARIABLES, EXPECTATION, LIMIT THEOREMS [CHAP. 4 If XI,..., X, are pairwise independent, then Let Then by Eq. (4.108), we have If XI,..., Xn are pairwise independent, then (Prob. 3.22) and Eq. (4.1 11) reduces to Var z a X ( 1 i) = z a: Var(Xi) MOMENT GENERATING FUNCTIONS Let the moment of a discrete r.v. X be given by Find the moment generating function of X. Find P(X = 0) and P(X = 1). By Eq. (4.41), the moment generating function of X is By definition (4.40), Thus, equating Eqs. (4.113) and (4.114), we obtain

159 CHAP. 41 FUNCTIONS OF RANDOM VARIABLES, EXPECTATION, LIMIT THEOREMS Let X be a Bernoulli r.v. (a) Find the moment generating function of X. (b) Find the mean and variance of X. (a) By definition (4.40) and Eq. (2.32), Mx(t) = E(etX) = 1 = et(o)px(o) etxipx(xi) + et(')p,(l) -5 (1 - p) + pet Hence, Let X be a binomial r.v. with parameters (n, p). Find the moment generating function of X. Find the mean and variance of X. By definition (4.40) and Eq. (2.36), and letting q = 1 - p, we get The first two derivatives of Mx(t) are Thus, by Eq. (4.42), M;(t) = n(q + pet)"- 'pet Mi(t) = n(q + pet)"- lpet + n(n - l)(q + pet)"-2(pet)2 Hence, Let X be a Poisson r.v. with parameter A. Find the moment generating function of X. Find the mean and variance of X. By definition (4.40) and Eq. (2.40), w 1' Mx(t) = E(etX) = etie -" -- i=o i! The first two derivatives of Mx(t) are M;(t) = AefeA(er- 1) M;(t) = (Aet)2el("- 1) + Aete"ec- 1) Thus, by Eq. (4.42), E(X) = Mi-0) = A E(X2) = M;(O) = A2 + A

160 FUNCTIONS OF RANDOM VARIABLES, EXPECTATION, LIMIT THEOREMS [CHAP. 4 Hence, Var(X) = E(X2) - [E(X)I2 = A2 + A - A2 = Let X be an exponential r.v. with parameter 1. Find the moment generating function of X. Find the mean and variance of X. By definition (4.40) and Eq. (2.48), The first two derivatives of Mx(t) are Thus, by Eq. (4.42), Hence, 2 Var(X) = E(X2) - [E(X)J2 = Find the moment generating function of the standard normal r.v. X = N(0; 1) and calculate the first three moments of X. By definition (4.40) and Eq. (2.52), Combining the exponents and completing the square, that is, we obtain since the integrand is the pdf of N(t; 1). Differentiating M,(t) with respect to t three times, we have Thus, by Eq. (4.42), Mi(t) = tet2i2 Mi(t) = (t2 + l)et2i2 ~~(~)(t) = (t3 + 3t)et2I2 E(X) = MgO) = 0 E(X2) = MgO) = 1 E(X3) = MX(3)(0) = Let Y = ax + b. Let Mx(t) be the moment generating function of X. Show that the moment generating function of Y is given by By Eqs. (4.40) and (4.105), My(t) = etbmx(at)

161 CHAP. 41 FUNCTIONS OF RANDOM VARIABLES, EXPECTATION, LIMIT THEOREMS Find the moment generating function of a normal r.v. N(p; a2). If X is N(0; I), then from Prob. 4.1 (or Prob. 4.37), we see that Y = ox + p is N(p; 02), Then by setting a = a and b = p in Eq. (4.120) (Prob. 4.50) and using Eq. (4.1 l9), we get Let XI,..., X, be n independent r.v.'s and let the moment generating function of Xi be Mxi(t). Let Y = X, +. + X,. Find the moment generating function of Y. By definition (4.40), Show that if XI,..., X, are independent Bernoulli r.v.'s with the parameter p, then Y = X, +. + X, is a binomial r.v. with the parameters (n, p). Using Eqs. (4.122) and (4.1 15), the moment generating function of Y is which is the moment generating function of a binomial r.v. with parameters (n, p) [Eq. (4.1 l6)j. Hence, Y is a binomial r.v. with parameters (n, p) Show that if XI,..., X, are independent Poisson r.v.'s Xi having parameter Ai, then Y = X, X, is also a Poisson r.v. with parameter 3, = A, +. + A,. Using Eqs. (4.1 22) and (4.1 17), the moment generating function of Y is which is the moment generating function of a Poisson r.v. with parameter 1. Hence, Y is a Poisson r.v. with parameter 1 = Xti = 1, + + 1,. Note that Prob is a special case for n = Show that if XI,..., X, are independent normal r.v.'s and Xi = N(pi; ai2), then Y = XI X, is also a normal r.v. with mean p = p p, and variance a2 = ai an2. Using Eqs. (4.1 22) and (4.1 21), the moment generating function of Y is which is the moment generating function of a normal r.v. with mean p and variance a2. Hence, Y is a normal r.v. with mean p = p, p, and variance 02 = a on2. Note that Prob is a special case for n = 2 with pi = 0 and ai2 = Find the moment generating function of a gamma r.v. Y with parameters (n, A). From Prob. 4.33, we see that if XI,..., X, are independent exponential r.v.'s, each with parameter A, then Y = X, Xn is a gamma r.v. with parameters (n, A). Thus, by Eqs. (4.122) and (4.118), the moment generating function of Y is

162 154 FUNCTIONS OF RANDOM VARIABLES, EXPECTATION, LIMIT THEOREMS [CHAP. 4 CHARACTERISTIC FUNCTIONS The r.v. X can take on the values x, = - 1 and x2 = + 1 with pmfs p,(x,) = p,(x2) = 0.5. Determine the characteristic function of X. By definition (4.50), the characteristic function of X is Yx(o) = 0.5e-jw + 0.5d0 = i(dw + e-jw) = cos o Find the characteristic function of a Cauchy r.v. X with parameter a and pdf given by By direct integration (or from the Table of Fourier transforms in Appendix B), we have the following Fourier transform pair : Now, by the duality property of the Fourier transform, we have the following Fourier transform pair: or (by the linearity property of the Fourier transform) a ++ e-alwl n(x2 + a2) Thus, the characteristic function of X is Yx(o) = e-alwl (4.124) Note that the moment generating function of the Cauchy r.v. X does not exist, since E(Xn) + co for n The characteristic function of a r.v. X is given by Find the pdf of X. From formula (4.51), we obtain the pdf of X as Find the characteristic function of a normal r.v. X = N(p; a2). The moment generating function of N(p; a2) is [Eq. (4.121)] Mx(t) = ep' + 02'2/2

163 CHAP. 4) FUNCTIONS OF RANDOM VARIABLES, EXPECTATION, LIMIT THEOREMS 155 Thus the characteristic function of N(p; a2) is obtained by setting t = jo in Mx(t); that is, Let Y = a x + b. Show that if Yx(w) is the characteristic function of X, then the characteristic function of Y is given by By definition (4.50), Using the characteristic equation technique, redo part (b) of Prob Let Z = X + Y, where X and Y are independent. Then Applying the convolution theorem of the Fourier transform (Appendix B), we obtain THE LAWS OF LARGE NUMBERS AND THE CENTRAL LIMIT THEOREM Verify the weak law of large numbers (4.58); that is, lirnp(ix,-pl>~)=o n+ do 1 where X,, = - (XI Xn) and E(XJ = p, Var(Xi) = 02. n Using Eqs. (4.1 08) and (4.1 1 Z), we have forany~ c2 E(X,) = p and Var(,Yn) = - Then it follows from Chebyshev's inequality [Eq. (2.97)] (Prob. 2.36) that Since limn,, 02/(ne2) = 0, we get lim P((R,-pi>&) =O n+ w Let X be a r.v. with pdff,(x) and let X,,..., X, be a set of independent r.v.'s each withpdff,(x). Then the set of r.v.'s XI,..., X, is called a random sample of size n of X. The sample mean is defined by

164 FUNCTIONS OF RANDOM VARIABLES, EXPECTATION, LIMIT THEOREMS [CHAP. 4 Let XI,..., X,, be a random sample of X with mean p and variance a2. How many samples of X should be taken if the probability that the sample mean will not deviate from the true mean p by more than a/10 is at least 0.95? Setting 8 = u/10 in Eq. (4.1 Z9), we have Thus if we want this probability to be at least 0.95, we must have 100/n < 0.05 or n 2 100/0.05 = Verify the central limit theorem (4.61 ). Let XI,..., X, be a sequence of independent, identically distributed r.v.'s with E(Xi) = p and Var(Xi) = a2. Consider the sum S, = X X,. Then by Eqs. (4.108) and (4.112), we have E(S,) = np and Var(S,) = na2. Let Then by Eqs. (4.105) and (4.106), we have E(Zn) = 0 and Var(Z,) = 1. Let M(t) be the moment generating function of the standardized r.v. = (Xi-,u)/a. Since E(5) = 0 and E(X2) = Var(q) = 1, by Eq. (4.42), we have Given that Mf(t) and M"(t) are continuous functions of t, a Taylor (or Maclaurin) expansion of M(t) about t = 0 can be expressed as By adding and subtracting t2/2, we have Now, by Eqs. (4.1 20) and (4.1 22), the moment generating function of 2, is Using Eq. (4.1 32), Eq. (4.133) can be written as where now t, is between 0 and t/&. Since M"(t) is continuous at t = 0 and t, -+ 0 as n + co, we have lim [MU(t,) - 11 = M"(0) - 1 = 1-1 = 0 n+co Thus, from elementary calculus, limn,, (1 + xln)" = ex, and we obtain The right-hand side is the moment generating function of the standard normal r.v. Z = N(0; 1) [Eq. ( Hence, by Lemma 4.2 of the moment generating function, lim Z, = N(0; 1) n+co

165 CHAP. 41 FUNCTIONS OF RANDOM VARIABLES, EXPECTATION, LIMIT THEOREMS 157 Let XI,..., Xn be n independent Cauchy r.v.'s with identical pdf shown in Prob Let Find the characteristic function of Y,. Find the pdf of Y,. Does the central limit theorem hold? From Eq. (4.1 24), the characteristic function of Xi is Let Y = XI X,. Then the characteristic function of Y is Now Y, = (l/n)y. Thus, by Eq. (4.126), the characteristic function of Y, is Equation (4.135) indicates that Y, is also a Cauchy r.v. with parameter a, and its pdf is the same as that of Xi. Since the characteristic function of Y, is independent of n and so is its pdf, Y, does not tend to a normal r.v. as n + a, and so the central limit theorem does not hold in this case. Let Y be a binomial r.v. with parameters (n, p). Using the central limit theorem, derive the approximation formula where Wz) is the cdf of a standard normal r.v. [Eq. (2.54)J. We saw in Prob that if XI,..., X, are independent Bernoulli r.v.3, each with parameter p, then Y = X, X, is a binomial r.v. with parameters (n, p). Since X,'s are independent, we can apply the central limit theorem to the r.v. 2, defined by Thus, for large n, 2, is normally distributed and Substituting Eq. (4.1 37) into Eq. (4.1 38) gives Because we are approximating a discrete distribution by a continuous one, a slightly better approximation is given by Formula (4.139) is referred to as a continuity correction of Eq. (4.1 36).

166 158 FUNCTIONS OF RANDOM VARIABLES, EXPECTATION, LIMIT THEOREMS [CHAP Let Y be a Poisson r.v. with parameter I. Using the central limit theorem, derive approximation formula : We saw in Prob that if XI,..., X, are independent Poisson r.v.'s Xi having parameter Ai, then Y = XI X, is also a Poisson r.v. with parameter 1 = 1, ,. Using this fact, we can view a Poisson r.v. Y with parameter I as a sum of independent Poisson r.v.'s Xi, i = 1,..., n, each with parameter 1/n; that is, The central limit theorem then implies that the r.v. Z defined by is approximately normal and Substituting Eq. (4.141) into Eq. (4.142) gives P(Z I z) P(? I z) = P(Y I Jiz + I) ie Wz) Again, using a continuity correction, a slightly better approximation is given by Supplementary Problems Let Y = 2X + 3. Find the pdf of Y if X is a uniform r.v. over (- 1, 2). l<y<7 Am. fy(y) = {i otherwise Let X be a r.v. with pdf fx(x). Let Y = I X I. Find the pdf of Y in terms of f,(x). Ans. fb) = Y <o Let Y = sin X, where X is uniformly distributed over (0, 2x). Find the pdf of Y Ans. fro) I ' -l<y<l = rrjm (0 otherwise

167 CHAP. 41 FUNCTIONS OF RANDOM VARIABLES, EXPECTATION, LIMIT THEOREMS Let X and Y be independent r.v.3, each uniformly distributed over (0, 1). Let Z = X + Y, W = X - Y. Find the marginal pdf s of Z and W. O<z<l w+l -l<w<o 1 <z<2 f&w)= O<w<l otherwise otherwise Let X and Y be independent exponential r.v.'s with parameters a and B, respectively. Find the pdf of (a) Z = X - Y ;(b) Z = X/Y; (c) Z = max(x, Y); (d) Z = min(x, Y) Let X denote the number of heads obtained when three independent tossings of a fair coin are made. Let Y = X2. Find E(Y). Ans Let X be a uniform rev. over (- 1, 1). Let Y = Xn. (a) Calculate the covariance of X and Y. (b) Calculate the correlation coefficient of X and Y n = even n = even Let the moment generating function of a discrete r.v. X be given by Find P(X = 3). Ans Let X be a geometric r.v. with parameter p. (a) Determine the moment generating function of X. (b) Find the mean of X for p = 3. Ans. pet Mx(t) = qe' Let X be a uniform r.v. over (a, b). Mx(t) = 0.25et e3' e5' (a) Determine the moment generating function of X. (b) Using the result of (a), find E(X), E(X2), and E(X3). erb - Ans. (a) Mx(t) = t(b - a) (b) E(X) = $(b + a), E(X2) = 4(b2 + ab + a2) E(X3) = $(b3 + b2a + ba2 + a3)

168 Ans. M,(t) = e - " FUNCTIONS OF RANDOM VARIABLES, EXPECTATION, LIMIT THEOREMS [CHAP. 4 Consider a r.v. X with pdf Find the moment generating function of X. Let X = N(0; 1). Using the moment generating function of X, determine E(Xn). Ans. E(Xn) = {e n = I, 3, 5, (n- 1 ) n = 2, 4, 6,... Let X and Y be independent binomial r.v.'s with parameters (n, p) and (m, p), respectively. Let Z = X + Y. What is the distribution of Z? Hint: Ans. Use the moment generating functions. Z is a binomial r.v. with parameters (n + m, p). Let (X, Y) be a continuous bivariate r.v. with joint pdf.fxu(x. Y) = (a) Find the joint moment function of X and Y. (h) Find the joint moments m,,, m,,, and m,,. x>o,y>o otherwise 1 Ans. (a) MXAt,, t,) = (h) m,, = 1, m,, = 1, m,, = 1 (1 - t,)(l - t2) Let (X, Y ) be a bivariate normal r.v. defined by Eq. (3.88). Find the joint moment generating function of X and Y. Let X,,..., X, be n independent r.v.'s and Xi > 0. Let Show that for large n, the pdf of Y is approximately log-normal. Hint: Take the natural logarithm of Y and use the central limit theorem and the result of Prob Let Y = (X - A)/JA, where X is a Poisson r.v, with parameter A. Show that Y = N(0; 1) when 1 is sufficiently large. Hint: Find the moment generating function of Y and let 1 -, co. Consider an experiment of tossing a fair coin 1000 times. Find the probability of obtaining more that 520 heads (a) by using formula (4.136), and (h) by formula (4.139). Ans. (a) (h) The number of cars entering a parking lot is Poisson distributed with a rate of 100 cars per hour. Find the time required for more than 200 cars to have entered the parking lot with probability 0.90 (a) by using formula (4.1 do), and (h) by formula (4.1 43). Ans. (a) h (h) h

169 Chapter 5 Random Processes 5.1 INTRODUCTION In this chapter, we introduce the concept of a random (or stochastic) process. The theory of random processes was first developed in connection with the study of fluctuations and noise in physical systems. A random process is the mathematical model of an empirical process whose development is governed by probability laws. Random processes provides useful models for the studies of such diverse fields as statistical physics, communication and control, time series analysis, population growth, and management sciences. 5.2 RANDOM PROCESSES A1. Defintion: A random process is a family of r.v.'s (X(t), t E T) defined on a given probability space, indexed by the parameter t, where t varies over an index set T. Recall that a random variable is a function defined on the sample space S (Sec. 2.2). Thus, a random process (X(t), t E T) is really a function of two arguments {X(t, c), t E T, 5 E S}. For a fixed t(=tk), X(tk, 5) = Xk(c) is a r.v. denoted by X(tk), as 5 varies over the sample space S. On the other hand, for a fixed sample point ci E S, X(t, ci) = Xi(t) is a single function of time t, called a sample function or a realization of the process. The totality of all sample functions is called an ensemble. Of course if both 5 and t are fixed, X(t,, ci) is simply a real number. In the following we use the notation X(t) to represent X(t, c).. B. Description of a Random Process: In a random process (X(t), t E T}, the index set T is called the parameter set of the random process. The values assumed by X(t) are called states, and the set of all possible values forms the state space E of the random process. If the index set T of a random process is discrete, then the process is called a discrete-parampter (or discrete-time) process. A discrete-parameter process is also called a random sequence and is denoted by {X,, n = 1, 2,...). If T is continuous, then we have a continuousparameter (or continuous-time) process. If the state space E of a random process is discrete, then the process is called a discrete-state process, often referred to as a chain. In this case, the state space E is often assumed to be (0, 1, 2,...). If the state space E is continuous, then we have a continuous-state process. A complex random process X(t) is defined by where Xl(t) and X,(t) are (real) random processes and j =,fq. Throughout this book, all random processes are real random processes unless specified otherwise. 5.3 CHARACTERIZATION OF RANDOM PROCESSES A. Probabilistic Descriptions: Consider a random process X(t). For a fixed time t,, X(t,) = X, is a r.v., and its cdf F,(xl; t,) is defined as

170 162 RANDOM PROCESSES [CHAP 5 F,(xl; t,) is known as the first-order distribution of X(t). Similarly, given t, and t,, X(t,) = X, and X(t2) = X, represent two r.v.3. Their joint distribution is known as the second-order distribution of X(t) and is given by In general, we define the nth-order distribution of X(t) by FX(x1,..., x,; ti,..., t,) = P{X(tl) I xl,..., X(t,) 2 x,) If X(t) is a discrete-time process, then X(t) is specified by a collection of pmf s: px(xl,..., Xn ; tl,..., t,) = P{X(tl) = XI,..., X(t,) = x,} (5.4) If X(t) is a continuous-time process, then X(t) is specified by a collection of pdf s: The complete characterization of X(t) requires knowledge of all the distributions as n + co. Fortunately, often much less is sufficient. B. Mean, Correlation, and Covariance Functions: As in the case of r.v.3, random processes are often described by using statistical averages. The mean of X(t) is defined by where X(t) is treated as a random variable for a fixed value of t. In general, p,(t) is a function of time, and it is often called the ensemble average of X(t). A measure of dependence among the r.v.'s of X(t) is provided by its autocorrelation function, defined by Note that and Rx(t, s) = ECX(t)X(s)l (5.7) RX@, s) = Rx(s, t) (5.8) Rx(~, t) = ECX2(t)l (5.9) The autocovariance function of X(t) is defined by KXV, s) = CovCX(t), X(s)l = E{CX(t) - px(t)lcx(s) - Px(s)l> = Rx(t, s) - Px(t)Px(s) (5.1 0) It is clear that if the mean of X(t) is zero, then Kx(t, s) = Rx(t, s). Note that the variance of X(t) is given by If X(t) is a complex random process, then its autocorrelation function Rx(t, s) and autocovariance function Kx(t, s) are defined, respectively, by 5.4 CLASSIFICATION OF RANDOM PROCESSES If a random process X(t) possesses some special probabilistic structure, we can specify less to characterize X(t) completely. Some simple random processes are characterized completely by only the first- and second-order distributions.

171 CHAP. 51 RANDOM PROCESSES A. Stationary Processes: A random process {X(t), t E T) is said to be stationary or strict-sense stationary if, for all n and for every set of time instants (t, E T, i = 1,2,..., n), for any 2. Hence, the distribution of a stationary process will be unaffected by a shift in the time origin, and X(t) and X(t + 2) will have the same distributions for any z. Thus, for the first-order distribution, and Then FX(x; t) = FX(x; t + 2) = FAX) fax; t) = fx(x) Px(t) = ECX(t)l = P Var[X(t)J = a2 where p and a2 are contants. Similarly, for the second-order distribution, and Fxh x2; t1, t2) = Fx(x1, x2; t2 - t1) (5.1 9) fx(% X2; tl, t2) =fx(~l, ~2; t2 - tl) (5.20) Nonstationary processes are characterized by distributions depending on the points t,, t,,..., tn. B. Wide-Sense Stationary Processes : If stationary condition (5.14) of a random process X(t) does not hold for all n but holds for n 5 k, then we say that the process X(t) is stationary to order k. If X(t) is stationary to order 2, then X(t) is said to be wide-sense stationary (WSS) or weak stationary. If X(t) is a WSS random process, then we have 1. E[X(t)] = p (constant) 2. Rx(t, S) = E[X(t)X(s)] = Rx( ( s - t 1 ) Note that a strict-sense stationary process is also a WSS process, but, in general, the converse is not true. C. Independent Processes: In a random process X(t), if X(ti) for i = 1,2,..., n are independent r.v.'s, so that for n = 2,3,..., FJX~,..., xn; t,,..., t,) = n F~(X,; then we call X(t) an independent random process. Thus, a first-order distribution is sufficient to characterize an independent random process X(t). n i= 1 ti) D. Processes with Stationary Independent Increments: A random process {X(t), t 2 0) is said to have independent increments if whenever 0 < t, < t, <... < t,, X(O), X(t1) - X(O), X(t2) - X(tl),.. X(tn) - X(tn- 1)

172 RANDOM PROCESSES [CHAP 5 are independent. If {X(t), t 2 0) has independent increments and X(t) - X(s) has the same distribution as X(t + h) - X(s + h) for all s, t, h 2 0, s < t, then the process X(t) is said to have stationary independent increments. Let {X(t), t 2 0) be a random process with stationary independent increments and assume that X(0) = 0. Then (Probs and 5.22) where p, = E[X(l)] and where a12 = Var[X(l)]. From Eq. (5.24), we see that processes with stationary independent increments are nonstationary. Examples of processes with stationary independent increments are Poisson processes and Wiener processes, which are discussed in later sections. E. Markov Processes: A random process (X(t), t E 7') is said to be a Markov process if whenever t1 < t2 < < t, < t,,,. A discrete-state Markov process is called a Markov chain. For a discrete-parameter Markov chain {X,, n 2 0) (see Sec. 5.5), we have for every n P(X,+, = j (X, = i,, Xi = i,,..., Xn = i) = P(Xn+, = jlxn = i) (5.27) Equation (5.26) or Eq. (5.27) is referred to as the Markov property (which is also known as the memoryless property). This property of a Markov process states that the future state of the process depends only on the present state and not on the past history. Clearly, any process with independent increments is a Markov process. Using the Markov property, the nth-order distribution of a Markov process X(t) can be expressed as (Prob. 5.25) Thus, all finite-order distributions of a Markov process distributions. p{x(tk) 2 ~ k) I X(tk - 1) = xk - I) (5.28) can be expressed in terms of the second-order F. Normal Processes : A random process {X(t), t E T) is said to be a normal (or gaussian) process if for any integer n and any subset (t,,..., t,) of T, the n r.v.'s X(tl),..., X(t,) are jointly normally distributed in the sense that their joint characteristic function is given by where w,,..., on are any real numbers (see Probs and 5.60). Equation (5.29) shows that a normal process is completely characterized by the second-order distributions. Thus, if a normal process is wide-sense stationary, then it is also strictly stationary.

173 CHAP. 53 RANDOM PROCESSES G. Ergodic Processes : Consider a random process {X(t), - co < t < co) with a typical sample function x(t). The time average of x(t) is defined as Similarly, the time autocorrelation function Rx(7) of x(t) is defined as A random process is said to be ergodic if it has the property that the time averages of sample functions of the process are equal to the corresponding statistical or ensemble averages. The subject of ergodicity is extremely complicated. However, in most physical applications, it is assumed that stationary processes are ergodic. 5.5 DISCRETE-PARAMETER MARKOV CHAINS In this section we treat a discrete-parameter Markov chain {X,, n 2 0) with a discrete state space E = (0, 1, 2,...), where this set may be finite or infinite. If X, = i, then the Markov chain is said to be in state i at time n (or the nth step). A discrete-parameter Markov chain {X,, n 2 0) is characterized by [Eq. ( P(Xn+l = jjxo = io, Xi = i,,..., X, = i) = P(X,+, = jjx, = i) (5.32) where P(x,+, = j 1 X, = i) are known as one-step transition probabilities. If P{x, +, = j 1 X, = i} is independent of n, then the Markov chain is said to possess stationary transition probabilities and the process is referred to as a homogeneous Markov chain. Otherwise the process is known as a nonhomogeneous Markov chain. Note that the concepts of a Markov chain's having stationary transition probabilities and being a stationary random process should not be confused. The Markov process, in general, is not stationary. We shall consider only homogeneous Markov chains in this section. A. Transition Probability Matrix : Let (X,, n 2 0) be a homogeneous Markov chain with a discrete infinite state space E = (0, 1, 2,...). Then regardless of the value of n. A transition probability matrix of (X,, n 2 0) is defined by where the elements satisfy

174 166 RANDOM PROCESSES [CHAP 5 In the case where the state space E is finite and equal to (1, 2,..., m), P is m x m dimensional; that is, where A square matrix whose elements satisfy Eq. (5.34) or (5.35) is called a Markov matrix or stochastic matrix. B. Higher-Order Transition Probabilities-Chapman-Kolmogorov Equation: Tractability of Markov chain models is based on the fact that the probability distribution of (X,, n 2 0) can be computed by matrix manipulations. Let P = pi,] be the transition probability matrix of a Markov chain {X,, n 2 0). Matrix powers of P are defined by with the (i, j)th element given by Note that when the state space E is infinite, the series above converges, since by Eq. (5.34), Similarly, p3 = PP~ has the (i, j)th element and in general, Pn + = PPn has the (i, j)th element Finally, we define PO = I, where I is the identity matrix. The n-step transition probabilities for the homogeneous Markov chain (X,, n 2 0) are defined by Then we can show that (Prob. 5.70) We compute pip1 by taking matrix powers. The matrix identity when written in terms of elements

175 CHAP. 51 RANDOM PROCESSES 167 is known as the Chapman-Kolmogorou equation. It expresses the fact that a transition from i to j in n + m steps can be achieved by moving from i to an intermediate k in n steps (with probability pik(n)), and then proceeding to j from k in m steps (with probability p,]")). Furthermore, the events "go from i to k in n steps" and "go from k to j in m steps" are independent. Hence the probability of the transition from i to j in n + rn steps via i, k, j is pik(")pk]"). Finally, the probability of the transition from i to j is obtained by summing over the intermediate state k. C. The Probability Distribution of {X,, n 2 0) : Let pi(n) = P(X, = i) and Then pi(0) = P(Xo = i) are the initial-state probabilities, is called the initial-state probability vector, and p(n) is called the state probability vector after n transitions or the probability distribution of X,. Now it can be shown that (Prob. 5.29) which indicates that the probability distribution of a homogeneous Markov chain is completely determined by the one-step transition probability matrix P and the initial-state probability vector HO). D. Classification of States: 1. Accessible States : State j is said to be accessible from state i if for some n 2 0, pi,.('" z 0, and we write i -+ j. Two states i and j accessible to each other are said to communicate, and we write i-j. If all states communicate with each other, then we say that the Markov chain is irreducible. 2. Recurrent States: Let be the time (or the number of steps) of the first visit to state j after time zero, unless state j is never visited, in which case we set Tj = oo. Then IT;. is a discrete r.v. taking values in (1, 2,..., m}. Let f;:~m)=~(t,=ml~,=i)=~(~,=j,~,#j,k=l,2,..., m-llxo=i) (5.40) and&io) = 0 since 7j 2 1. Then and The probability of visiting j in finite time, starting from i, is given by Now state j is said to be recurrent if fjj=p(t,< coix,=j)= 1

176 168 RANDOM PROCESSES [CHAP 5 That is, starting from j, the probability of eventual return to j is one. A recurrent state j is said to be positive recurrent if and state j is said to be null recurrent if Note that E(qIXo =j) < co (5.45) E(qT;.Xo=j)= co 3. Transient States: State j is said to be transient (or nonrecurrent) if fjj=p(q< coixo=j)< 1 In this case there is positive probability of never returning t~ state j. 4. Periodic and Aperiodic States : We define the period of state j to be where gcd stands for greatest common divisor. If d(j) > 1, then state j is called periodic with period d(j). If d(j) = 1, then state j is called aperiodic. Note that whenever pjj > 0, j is aperiodic. 5. Absorbing States: State j is said to be an absorbing state if pjj = 1 ; that is, once state j is reached, it is never left. E. Absorption Probabilities: Consider a Markov chain X(n) = {X,, n 2 0) with finite state space E = (1, 2,..., N ) and transition probability matrix P. Let A = (1,..., m) be the set of absorbing states and B = {m + 1,..., N ) be a set of nonabsorbing states. Then the transition probability matrix P can be expressed as where I is an m x m identity matrix, 0 is an m x (N - m) zero matrix, and Note that the elements of R are the one-step transition probabilities from nonabsorbing to absorbing states, and the elements of Q are the one-step transition probabilities among the nonabsorbing states.

177 CHAP. 51 RANDOM PROCESSES Let U = [ukj], where ukj = P{X, = j (~ A) I X, = k (~ B)} It is seen that U is an (N - m) x m matrix and its elements are the absorption probabilities for the various absorbing states. Then it can be shown that (Prob. 5.40) U = (I - Q)-'R = (DR (5.50) The matrix (D = (I - Q)-' is known as the fundamental matrix of the Markov chain X(n). Let T, denote the total time units (or steps) to absorption from state k. Let Then it can be shown that (Prob. 5.74) where 4ki is the (k, i)th element of the fundamental matrix 0. F. Stationary Distributions: Let P be the transition probability matrix of a homogeneous Markov chain {X,, n 2 0). exists a probability vector p such that fip = fi If there then p is called a stationary distribution for the Markov chain. Equation (5.52) indicates that a stationary distribution p is a (left) eigenvector of P with eigenvalue 1. Note that any nonzero multiple of B is also an eigenvector of P. But the stationary distribution p is fixed by being a probability vector; that is, its components sum to unity. G. Limiting Distributions: A Markov chain is called regular if there is a finite positive integer m such that after m time-steps, every state has a nonzero chance of being occupied, no matter what the initial state. Let A > 0 denote that every element aij of A satisfies the condition aij > 0. Then, for a regular Markov chain with transition probability matrix P, there exists an m > 0 such that Pm > 0. For a regular homogeneous Markov chain we have the following theorem: THEOREM Let {X,, n 2 0) be a regular homogeneous finite-state Markov chain with transition matrix P. Then lim Pn = n+m where is a matrix whose rows are identical and equal to the stationary distribution p for the Markov chain defined by Eq. (5.52). 5.6 POISSON PROCESSES A. Definitions: Let t represent a time variable. Suppose an experiment begins at t = 0. Events of a particular kind occur randomly, the first at TI, the second at T2, and so on. The r.v. IT;. denotes the time at which the ith event occurs, and the values ti of (i = 1,2,...) are called points of occurrence (Fig. 5-1).

178 RANDOM PROCESSES [CHAP ttl- 1 Fig. 5-1 Let Z, = T, - T,-, (5.54) and To = 0. Then Z, denotes the time between the (n - 1)st and the nth events (Fig. 5-1). The sequence of ordered r.v.'s {Z,, n 2 1) is sometimes called an interarrival process. If all r.v.'s Z, are independent and identically distributed, then {Z,, n 2 1) is called a renewal process or a recurrent process. From Eq. (5.54), we see that where T, denotes the time from the beginning until the occurrence of the nth event. Thus, (T,, n 2 0) is sometimes called an arrival process. B. Counting Processes : A random process {X(t), t 2 0) is said to be a counting process if X(t) represents the total number of "events" that have occurred in the interval (0, t). From its definition, we see that for a counting process, X(t) must satisfy the following conditions: 1. X(t) 2 0 and X(0) = X(t) is integer valued. 3. X(s) ~X(t)ifs < t. 4. X(t) - X(s) equals the number of events that have occurred on the interval (s, t). A typical sample function (or realization) of X(t) is shown in Fig A counting process X(t) is said to possess independent increments if the numbers of events which occur in disjoint time intervals are independent. A counting process X(t) is said to possess stationary increments if the number of events in the interval (s + h, t + h)-that is, X(t + h) - X(s + hehas the same distribution as the number of events in the interval (s, t)-that is, X(t) - X(s)--for all s < t and h > 0. Fig. 5-2 A sample function of a counting process. C. Poisson Processes: One of the most important types of counting processes is the Poisson process (or Poisson counting process), which is defined as follows:

179 CHAP. 5) RANDOM PROCESSES DEFINITION A counting process X(t) is said to be a Poisson process with rate (or intensity) 1(> 0) if X(t) has independent increments. The number of events in any interval of length t is Poisson distributed with mean At; that is, for all s, t > 0, It follows from condition 3 of Def that a Poisson process has stationary increments and that Then by Eq. (2.43) (Sec. 2.7C), we have E[X(t)] = At Var[X(t)] = At Thus, the expected number of events in the unit interval (0, I), or any other interval of unit length, is just A (hence the name of the rate or intensity). An alternative definition of a Poisson process is given as follows : DEFINITION A counting process X(t) is said to be a Poisson process with rate (or intensity) A(>O) if 1. X(0) = X(t) has independent and stationary increments. 3. P[X(t + At) - X(t) = 11 = A At + o(at) 4. P[X(t + At) - X(t) 2 21 = o(at) where o(at) is a function of At which goes to zero faster than does At; that is, lim - om) - 0 at-o At Note : Since addition or multiplication by a scalar does not change the property of approaching zero, even when divided by At, o(at) satisfies useful identities such as o(at) + o(at) = o(at) and ao(at) = o(at) for all constant a. It can be shown that Def and Def are equivalent (Prob. 5.49). Note that from conditions 3 and 4 of Def , we have (Prob. 5.50) P[X(t + At) - X(t) = 0] = 1-1 At + o(at) (5.59) Equation (5.59) states that the probability that no event occurs in any short interval approaches unity as the duration of the interval approaches zero. It can be shown that in the Poisson process, the intervals between successive events are independent and identically distributed exponential r.v.'s (Prob. 5.53). Thus, we also identify the Poisson process as a renewal process with exponentially distributed intervals. The autocorrelation function Rx(t, s) and the autocovariance function Kdt, s) of a Poisson process X(t) with rate 1 are given by (Prob. 5.52)

180 RANDOM PROCESSES [CHAP WIENER PROCESSES Another example of random processes with independent stationary increments is a Wiener process. DEFINITION A random process (X(t), t 2 0) is called a Wiener process if 1. X(t) has stationary independent increments. 2. The increment X(t) - X(s) (t > s) is normally distributed. 3. E[X(t)J =O. 4. X(0) = 0. The Wiener process is also known as the Brownian motion process, since it originates as a model for Brownian motion, the motion of particles suspended in a fluid. From Def , we can verify that a Wiener process is a normal process (Prob. 5.61) and where a2 is a parameter of the Wiener process which must be determined from observations. When a2 = 1, X(t) is called a standard Wiener (or standard Brownian motion) process. The autocorrelation function Rx(t, s) and the autocovariance function K,(t, s) of a Wiener process X(t) are given by (see Prob. 5.23) DEFINITION A random process (X(t), t 2 0) is called a Wiener process with drift coeficient p if 1. X(t) has stationary independent increments. 2. X(t) is normally distributed with mean pt. 3. X(0) = 0. From condition 2, the pdf of a standard Wiener process with drift coefficient p is given by 4 Solved Problems RANDOM PROCESSES 5.1. Let XI, X,,... be independent Bernoulli r.v.'s (Sec. 2.7A) with P(X, = 1) = p and P(X, = 0) = q = 1 - p for all n. The collection of r.v.'s (X,, n 2 1) is a random process, and it is called a Bernoulli process. (a) Describe the Bernoulli process. (b) Construct a typical sample sequence of the Bernoulli process. (a) The Bernoulli process {X,, n 2 1) is a discrete-parameter, discrete-state process. The state space is E = (0, I), and the index set is T = {1,2,...).

181 CHAP. 51 RANDOM PROCESSES 173 (b) A sample sequence of the Bernoulli process can be obtained by tossing a coin consecutively. If a head appears, we assign 1, and if a tail appears, we assign 0. Thus, for instance, n Coin tossing H T T H H H T H H T... xn ~ ~ The sample sequence {x,} obtained above is plotted in Fig I I - A I I A I I ) n Fig. 5-3 A sample function of a Bernoulli process Let Z,, Z,,... be independent identically distributed r.v.'s with P(Zn = 1) = p and P(Z, = - 1) = q = 1 - p for all n. Let and X, = 0. The collection of r.v.'s {X,, n > 0) is a random process, and it is called the simple random walk X(n) in one dimension. (a) Describe the simple random walk X(n). (b) Construct a typical sample sequence (or realization) of X(n). (a) The simple random walk X(n) is a discrete-parameter (or time), discrete-state random process. The state space is E = (..., -2, - 1,0, 1, 2,...), and the index parameter set is T = (0, 1,2,...). (b) A sample sequence x(n) of a simple random walk X(n) can be produced by tossing a coin every second and letting x(n) increase by unity if a head appears and decrease by unity if a tail appears. Thus, for instance, n Coin tossing H T T H H H T H H T - m e x(n) a. The sample sequence x(n) obtained above is plotted in Fig The simple random walk X(n) specified in this problem is said to be unrestricted because there are no bounds on the possible values of X,. The simple random walk process is often used in the following primitive gambling model: Toss a coin. If a head appears, you win one dollar; if a tail appears, you lose one dollar (see Prob. 5.38) Let (x,, n 2 0) be a simple random walk of Prob Now let the random process X(t) be defined by (a) Describe X(t). X(t)=Xn (b) Construct a typical sample function of X(t). n<t<n+l (a) The random process X(t) is a continuous-parameter (or time), discrete-state random process. The state space is E = {..., -2, - 1,0, 1,2,...}, and the index parameter set is T = (t, t 2 0). (b) A sample function x(t) of X(t) corresponding to Fig. 5-4 is shown in Fig. 5-5.

182 RANDOM PROCESSES [CHAP 5 Fig. 5-4 A sample function of a random walk. Consider a random process X(t) defined by Fig. 5-5 X(t) = Y cos ot t 2 0 where o is a constant and Y is a uniform r.v. over (0, 1). (a) Describe X(t). (b) Sketch a few typical sample functions of X(t). (a) The random process X(t) is a continuous-parameter (or time), continuous-state random process. The state space is E = {x:- 1 < x < 1) and the index parameter set is T = {t: t 2 0). (b) Three sample functions of X(t) are sketched in Fig Consider patients coming to a doctor's office at random points in time. Let X, denote the time (in hours) that the nth patient has to wait in the office before being admitted to see the doctor. (a) Describe the random process X(n) = {X,, n 2 1). (b) Construct a typical sample function of X(n). (a) The random process X(n) is a discrete-parameter, continuous-state random process. The state space is E = {x: x 2 0)' and the index parameter set is T = (1'2,...). (b) A sample function x(n) of X(n) is shown in Fig CHARACTERIZATION OF RANDOM PROCESSES 5.6. Consider the Bernoulli process of Prob Determine the probability of occurrence of the sample sequence obtained in part (b) of Prob. 5.1.

183 CHAP. 51 RANDOM PROCESSES Fig. 5-6 Since X,'s are independent, we have P(X1 = xl, X2 = x,,..., X, = x,) = P(Xl = x,)p(x, = x,).. P(X, = x,) (5.67) Thus, for the sample sequence of Fig. 5-3, P(xl = 1, x2 = 0, x3 = 0, x4 = 1, x5 = 1, x6 = 1, x7 = 0, xs = 1, x9 = 1, Xl0 = 0) = p6q Fig. 5-7

184 176 RANDOM PROCESSES [CHAP Consider the random process X(t) of Prob Determine the pdf s of X(t) at t = 0, n/4w, 42w, n/w. For t = 0, X(0) = Y cos 0 = Y. Thus, For t = rr/4o, X(n/4o) = Y cos n/4 = 1/$ Y. Thus, O<x<l/JZ otherwise For t = 420, X(zl2o) = Y cos n/2 = 0; that is, X(n/20) = 0 irrespective of the value of Y. Thus, the pmf of X(o/2o) is 1 -l<x<o 0 otherwise 5.8. Derive the first-order probability distribution of the simple random walk X(n) of Prob The first-order probability distribution of the simple random walk X(n) is given by where k is an integer. Note that P(Xo = 0) = 1. We note that p,(k) = 0 if n < 1 k 1 because the simple random walk cannot get to level k in less than I k I steps. Thus, n 2 1 k I. Let Nnf and N,- be the r.v.'s denoting the numbers of + 1s and - Is, respectively, in the first n steps. Then Adding Eqs. (5.68) and (5.69), we get Nnt = $(n + X,) (5.70) Thus, X, = k if and only if N,+ = i(n + k). From Eq. (5.70), we note that 2N,+ = n + X, must be even. Thus, X, must be even if n is even, and X, must be odd if n is odd. We note that N,+ is a binomial r.v. with parameters (n, p). Thus, by Eq. (2.36), we obtain where n 2 ( k 1, and n and k are either both even or both odd Consider the simple random walk X(n) of Prob (a) Find the probability that X(n) = - 2 after four steps. (b) Verify the result of part (a) by enumerating all possible sample sequences that lead to the value X(n) = - 2 after four steps. (a) Setting k = -2 and n = 4 in Eq. (5.71), we obtain

185 RANDOM PROCESSES Fig. 5-8 (b) All possible sample functions that lead to the value X, = -2 after 4 steps are shown in Fig For each sample sequence, P(X, = -2) = pq3. There are only four sample functions that lead to the value X, = -- 2 after four steps. Thus P(X, = - 2) = 4pq Find the mean and variance of the simple random walk X(n) of Prob From Eq. (5.66), we have and X, = 0 and 2, (n = 1,2,...) are independent and identically distributed (iid) r.v.'s with From Eq. (5.72), we observe that Then, because the 2, are iid r.v.3 and Xo = 0, by Eqs. (4.108) and (4.1 12), we have Now Var(X,) = Var 1 Z, = n Var(Z,) ) Thus Hence. Note that if p = q = i, then Find the autocorrelation function R,(n, rn) of the simple random walk X(n) of Prob. 5.2.

186 RANDOM PROCESSES [CHAP 5 From Eq. (5.73), we can express X, as where Z, = X, = 0 and Zi (i 2 1) are iid r.v.3 with P(Zi = +1) = p P(Z,= -l)=q= l-p Using Eqs. (5.74) and (5.75), we obtain R,(n, m) = min(n, m) + [nm- min(n, m)](p - q)2 m+(nm-m)(p-q)2 m<n Rx(n, m) = n + (nm- nkp - q)2 n < m Note that ifp = q = 3, then Rx(n, m) = min(n, m) n, m > 0 i + k Consider the random process X(t) of Prob. 5.4; that is, X(t)=Ycosot where cu is a constant and Y is a uniform r.v. over (0, 1). (a) Find E[X(t)]. (b) Find the autocorrelation function R,(t, s) of X(t). (c) Find the autocovariance function Kx(t, s) of X(t). t2o (a) From Eqs. (2.46) and (2.91), we have E(Y) = 4 and E(y2) = 4. Thus (b) By Eq. (5.7), we have (c) By we have E[X(t)] = E(Y cos ot) = E(Y) cos at = 4 cos ot R,(t, s) = E[X(t)X(s)] = E(Y2 cos wt cos US) = E(Y~) cos wt cos US = 3 cos ot cos US Kx(t, s) = Rdt, s) - ECX(t)lECX(s)l = 4 COS Ot COS US - 3 cos ot cos os = COS Ot COS US Consider a discrete-parameter random process X(n) = {X,, n 2 1) where the Xis are iid r.v.'s with common cdf F,(x), mean p, and variance a2. (a) Find the joint cdf of X(n). (b) Find the mean of X(n). (c) Find the autocorrelation function Rdn, m) of X(n). (d) Find the autocovariance function Kx(n, m) of X(n).

187 CHAP. 5) RANDOM PROCESSES (a) Since the X,'s are iid r.v.'s with common cdf FX(x), the joint cdf of X(n) is given by (b) The mean of X(n) is px(n) = E(Xn) = p for all n (c) If n # m, by Eqs. (5.7) and (5.90), Rx(n, m) = E(Xn X,) = E(X,)E(X,) = p2 If n = m, then by Eq. (2.31), Hence, (4 BY Eq. (5.m CLASSIFICATION OF RANDOM PROCESSES Show that a random process which is stationary to order n is also stationary to all orders lower than n. Assume that Eq. (5.14) holds for some particular n; that is, for any z. Letting x, -, a, we have [see Eq. (3.63)] and the process is stationary to order n - 1. Continuing the same procedure, we see that the process is stationary to all orders lower than n Show that if {X(t), t E T) is a strict-sense stationary random process, then it is also WSS. Since X(t) is strict-sense stationary, the first- and second-order distributions are invariant through time translation for all T E T. Then we have px(t) = E[X(t)] = E[X(t + T)] = px(t + t) and hence the mean function pdt) must be constant; that is, Similarly, we have E[X(t)] = p (constant) E[X(s)X(t)] = E[X(s + z)x(t + T)] so that the autocorrelation function would depend on the time points s and t only through the difference ( t - s 1. Thus, X(t) is WSS Let (x,, n 2 0) be a sequence of iid r.v.'s with mean 0 and variance 1. Show that (X,, n 2 O} is a WSS process. By Eq. (5.90), E(Xn) = 0 (constant) for all n

188 RANDOM PROCESSES [CHAP 5 and by Eq. (5.91), which depends only on k. Thus, (X,} is a WSS process Show that if a random process X(t) is WSS, then it must also be covariance stationary. NOW If X(t) is WSS, then E[X(t)] = p (constant) Rx(t, t + r)] = Rx(7) for all t for all t Kx(t, t + T) = Cov[X(t)X(t + T)] = Rx(t, t + z) - E[X(t)]E[X(t + z)] = R,(z) - p2 which indicates that Kx(t, t + z) depends only on z; thus, X(t) is covariance stationary Consider a random process X(t) defined by X(t)= U cos cot + V sin cot where ~ r is > constant and U and V are r.v.'s. (a) Show that the condition is necessary for X(t) to be stationary. E(U) = E(V) = 0 -a < t < KI (b) Show that X(t) is WSS if and only if U and V are uncorrelated with equal variance; that is, E(UV) = o E(u~) = E(v~) = c2 (5.95) (a) Now px(t) = E[X(t)] = E(U) cos wt + E(V) sin cot must be independent of t for X(t) to be stationary. This is possible only if px(t) = 0, that is, E(U) = E(V) = 0. (6) If X(t) is WSS, then But X(0) = U and X(n/2w) = V; thus Using the above result, we obtain Rx(t, t + 7) = E[X(t)X(t + T)] E(U2) = E(V2) = ax2 = a2 = E((U cos wt + V sin ot)[u cos o(t + z) + V sin o(t + z)]} = o2 cos oz + E(UV) sin(2wt + wz) (5.96) which will be a function of z only if E(UV) = 0. Conversely, if E(UV) = 0 and E(U2) = E(V2) = 02, then from the result of part (a) and Eq. (5.96), we have Hence, X(t) is WSS.

189 CHAP. 51 RANDOM PROCESSES Consider a random process X(t) defined by X(t) = U cos t + V sin t - m < t < where U and V are independent r.v.'s, each of which assumes the values -2 and 1 with the probabilities 4 and 3, respectively. Show that X(t) is WSS but not strict-sense stationary. We have Since U and V are independent, Thus, by the results of Prob. 5.18, X(t) is WSS. To see if X(t) is strict-sense stationary, we consider E[x3(t)]. E[X3(t)] = E[(U cos t + V sin t)3] Now = E(U3) cos3 t + 3E(U2V) cos2 t sin t + 3E(UV2) cos t sin2 t + E(V3) sin3 t Thus E[X3(t)J = --2(cos3 t + sin3 t) which is a function of t. From Eq, (5.16), we see that all the moments of a strict-sense stationary process must be independent of time. Thus X(t) is not strict-sense stationary Consider a random process X(t) defined by X(t) = A cos(wt + 0) - co < t < co where A and w are constants and 0 is a uniform r.v. over (-71, n). Show that X(t) is WSS. From Eq. (2.44), we have (0 otherwise Then cos(wt + 0) db = 0 Setting s = t +.t in Eq. (5.7), we have = A'!. [cos wr + cos(2wt wr)] d8 27c -, 2 =- 2 cos wz Since the mean of X(t) is a constant and the autocorrelation of X(t) is a function of time difference only, we conclude that X(t) is WSS Let (X(t), t 2 0) be a random process with stationary independent increments, and assume that X(0) = 0. Show that

190 RANDOM PROCESSES [CHAP 5 where p, = E[X(l)]. Then, for any t and s and using Eq. (4.108) and the property of the stationary independent increments, we have The only solution to the above functional equation is f(t) = ct, where c is a constant. Since c = f(1) = E[X(l)], we obtain Let {X(t), t 2 0) be a random process with stationary independent increments, and assume that X(0) = 0. Show that (4 Var[X(t)] = ai2t (5.101) (4 Var[X(t) - X(s)] = a, 2(t - s) t > s (5.102) where a12 = Var[X(l)]. (a) Let g(t) = Var[X(t)] = Var[X(t) - X(O)] Then, for any t and s and using Eq. (4.1 12) and the property of the stationary independent increments, we get which is the same functional equation as Eq. (5.100). Thus, g(t) = kt, where k is a constant. Since k = g(1) = Var[X(l)], we obtain (b) Let t > s. Then Thus, using Eq. (5.1 Ol), we obtain Let {X(t), t 2 0) be a random process with stationary independent increments, and assume that X(0) = 0. Show that where ai2 = Var[X(l)].

191 CHAP. 51 RANDOM PROCESSES By definition (2.28), Thus, Using Eqs. (3.1 01) and (5.1 02), we obtain or where ai2 = Var[X(l)]. Cov[X(t), X(s)] = +{Var[X(t)] + var[x(s)] - Var[X(t) - X(s)l) $a12[t + s - (t - s)] = a12s KX(4 s) = $a12[t + s - (S- t)] = a12t t > s s > t (a) Show that a simple random walk X(n) of Prob. 5.2 is a Markov chain. (b) Find its one-step transition probabilities. (a) From Eq. (5.73) (Prob. 5.10), X(n) = {X,, n 2 0) can be expressed as where Z, (n = 1,2,...) are iid r.v.'s with P(Z,=k)=ak ( 1-1 and a,=p a-,=q=l-p Then X(n) = {X,, n 2 0) is a Markov chain, since P(X,+l=i,+l~X,=O,Xl=i,,..., X,=i,) = P(Z,+, + in = in+, lxo = 0, X, = i,,..., X, = in) = P(Z,+l = in+, -in) = ain+i-in = P(X,+, = in+, IX, = in) (b) since Z,+, is independent of X,, X,,..., X,. The one-step transition probabilities are given by k=j+l pjk=p(x,=klx,-i =j)= 1 -p k=j-1 otherwise which do not depend on n. Thus, a simple random walk X(n) is a homogeneous Markov chain Show that for a Markov process X(t), the second-order distribution is suficient to characterize Let X(t) be a Markov process with the nth-order distribution Then, using the Markov property (5.26), we have Applying the above relation repeatedly for lower-order distribution, we can write

192 RANDOM PROCESSES [CHAP 5 Hence, all finite-order distributions of a Markov process can be completely determined by the second-order distribution Show that if a normal process is WSS, then it is also strict-sense stationary. By Eq. (5.29), a normal random process X(t) is completely characterized by the specification of the mean E[X(t)] and the covariance function Kx(t, s) of the process. Suppose that X(t) is WSS. Then, by Eqs. (5.21) and (5.22), Eq. (5.29) becomes Now we translate all of the time instants t,, t,,..., t, by the same amount z. The joint characteristic function of the new r.v.'s X(ti + z), i = 1, 2,..., n, is then which indicates that the joint characteristic function (and hence the corresponding joint pdf) is unaffected by a shift in the. time origin. Since this result holds for any n and any set of time instants (ti E T, i = 1,2,..., n), it follows that if a normal process is WSS, then it is also strict-sense stationary Let (X(t), - oo < t < oo} be a zero-mean, stationary, normal process with the autocorrelation function Let {X(ti), (0 otherwise i = 1,2,..., n) be a sequence of n samples of the process taken at the time instants Find the mean and the variance of the sample mean Since X(t) is zero-mean and stationary, we have and Rx(ti, tk) = E[X(ti)X(tk)] = RX(tk - ti) = Rx Thus

193 CHAP. 51 RANDOM PROCESSES By Eq. (5.107), Thus 1 1 n2 n Var(&) = - [n(l) + 2(n - l)(3) + 03 = - (2n - 1) DISCRETE-PARAMETER MARKOV CHAINS Show that if P is a Markov matrix, then Pn is also a Markov matrix for any positive integer n. Let Then by the property of a Markov matrix [Eq. (5.391, we can write where at=[l 1 11 Premultiplying both sides of Eq. (5.111) by P, we obtain P2a = Pa = a which indicates that P2 is also a Markov matrix. Repeated premultiplication by P yields which shows that P" is also a Markov matrix Verify Eq. (5.39); that is, We verify Eq. (5.39) by induction. If the state of X, is i, state XI will be j only if a transition is made from i to j. The events {X, = i, i = 1, 2,...} are mutually exclusive, and one of them must occur. Hence, by the law of total probability [Eq. (1.44)], In terms of vectors and matrices, Eq. (5.1 12) can be expressed as ~(1) = P(0)P Thus, Eq. (5.39) is true for n = 1. Assume now that Eq. (5.39) is true for n = k; that is, PW = p Wk

194 186 RANDOM PROCESSES [CHAP 5 Again, by the law of total probability, In terms of vectors and matrices, Eq. (5.1 14) can be expressed as which indicates that Eq. (5.39) is true for k + 1. Hence, we conclude that Eq. (5.39) is true for all n Consider a two-state Markov chain with the transition probability matrix (a) Show that the n-step transition probability matrix Pn is given by (b) Find Pn when n -, a. (a) From matrix analysis, the characteristic equation of P is Thus, the eigenvalues of P are 1, = 1 and A, = 1 - a - b. Then, using the spectral decomposition method, Pn can be expressed as where El and E, are constituent matrices of P, given by Pn = AlnE, + A2"E2 (5.1 18) 1 1 El =- [p E, =- [f' (5.1 19) A1 Substituting 1, = 1 and 1, = 1 - a - b in the above expressions, we obtain Thus, by Eq. (5.1 18), we obtain (b) IfO<a<l,O<b<l,thenO< 1-a< 1andI1-a-bI< l.solimn,,(l-a-b)"=oand Note that a limiting matrix exists and has the same rows (see Prob. 5.47) An example of a two-state Markov chain is provided by a communication network consisting of the sequence (or cascade) of stages of binary communication channels shown in Fig Here X,

195 CHAP. 51 RANDOM PROCESSES x,, -, = 0 I -a x,, = 0 Fig. 5-9 Binary communication network. denotes the digit leaving the nth stage of the channel and X, denotes the digit entering the first stage. The transition probability matrix of this communication network is often called the channel matrix and is given by Eq. (5.1 16); that is, Assume that a = 0.1 and b = 0.2, and the initial distribution is P(X, = 0) = P(X, = 1) = 0.5. (a) Find the distribution of X,. (b) Find the distribution of X, when n -, co. (a) The channel matrix of the communication network is and the initial distribution is By Eq. (5.39), the distribution of X, is given by Letting a = 0.1 and b = 0.2 in Eq. (5.1 17), we get Thus, the distribution of Xn is

196 RANDOM PROCESSES [CHAP 5 that is, 2 (0.7)" P(X, = 0) = (0.7)" and P(X, = 1) = (b) Since lim,,,(0.7)" = 0, the distribution of X, when n -, oo is P(X, = 0) = 3 and P(X, = 1) = 3 Verify the transitivity property of the Markov chain ; that is, if i -+ j and j -+ k, then i -+ k. By definition, the relations i + j and j -, k imply that there exist integers n and m such that pip) > 0 and pi:") > 0. Then, by the Chapman-Kolmogorov equation (5.38), we have Therefore i -, k. = 1 pir*)prk(") 2 pij(n)pjk(m) > 0 (5.122) r Verify Eq. (5.42). If the Markov chain (X,} goes from state i to state j in m steps, the first step must take the chain from i to some state k, where k # j. Now after that first step to k, we have m - 1 steps left, and the chain must get to state j, from state k, on the last of those steps. That is, the first visit to state j must occur on the (m - 1)st step, starting now in state k. Thus we must have Show that in a finite-state Markov chain, not all states can be transient. Suppose that the states are 0, 1,..., m, and suppose that they are all transient. Then by definition, after a finite amount of time (say To), state 0 will never be visited; after a finite amount of time (say TI), state 1 will never be visited; and so on. Thus, after a finite time T = max{t,, TI,..., T,), no state will be visited. But as the process must be in some state after time T, we have a contradiction. Thus, we conclude that not all states can be transient and at least one of the states must be recurrent. A state transition diagram of a finite-state Markov chain is a line diagram with a vertex corresponding to each state and a directed line between two vertices i and j if pij > 0. In such a diagram, if one can move from i and j by a path following the arrows, then i + j. The diagram is useful to determine whether a finite-state Markov chain is irreducible or not, or to check for periodicities. Draw the state transition diagrams and classify the states of the Markov chains with the following transition probability matrices: "1 (a) P = [:5 I.,

197 CHAP. 51 RANDOM PROCESSES 189 Fig State transition diagram. (a) The state transition diagram of the Markov chain with P of part (a) is shown in Fig. 5-lqa). From Fig. 5-10(a), it is seen that the Markov chain is irreducible and aperiodic. For instance, one can get back to state 0 in two steps by going from 0 to 1 to 0. However, one can also get back to state 0 in three steps by going from 0 to 1 to 2 to 0. Hence 0 is aperiodic. Similarly, we can see that states 1 and 2 are also aperiodic. (b) The state transition diagram of the Markov chain with P of part (b) is shown in Fig. 5-10(b). From Fig. 5-10(b), it is seen that the Markov chain is irreducible and periodic with period 3. (c) The state transition diagram of the Markov chain with P of part (c) is shown in Fig. 5-10(c). From Fig. 5-10(c), it is seen that the Markov chain is not irreducible, since states 0 and 4 do not communicate, and state 1 is absorbing Consider a Markov chain with state space (0, 1) and transition probability matrix (a) Show that state 0 is recurrent. (b) Show that state 1 is transient. (a) By Eqs. (5.41) and (5.42), we have Then, by Eqs. (5.43), Thus, by definition (5.44), state 0 is recurrent. (b) Similarly, we have and 00 fll=p(tl <mixo=l)= fl1(")=i+o+o+-.-=~<1 Thus, by definition (5.48), state 1 is transient.

198 190 RANDOM PROCESSES [CHAP Consider a Markov chain with state space (0, 1,2) and transition probability matrix Show that state 0 is periodic with period 2. The characteristic equation of P is given by Thus, by the Cayley-Hamilton theorem (in matrix analysis), we have P3 = P. Thus, for n 2 1, Therefore d(0) = gcd{n 2 1 : poo(") > 0) = gcd(2, 5, 6,...) = 2 Thus, state 0 is periodic with period 2. Note that the state transition diagram corresponding to the given P is shown in Fig From Fig. 5-11, it is clear that state 0 is periodic with period 2. Fig Let two gamblers, A and B, initially have k dollars and m dollars, respectively. Suppose that at each round of their game, A wins one dollar from B with probability p and loses one dollar to B with probability q = 1 - p. Assume that A and B play until one of them has no money left. (This is known as the Gambler's Ruin problem.) Let X, be A's capital after round n, where n = 0, 1, 2,... and X, = k. (a) Show that X(n) = (X,, n 2 0) is a Markov chain with absorbing states. (b) Find its transition probability matrix P.

199 CHAP. 51 RANDOM PROCESSES (a) The total capital of the two players at all times is (b) k+m=n Let Z, (n 2 1) be independent r.v.3 with P(Z, = 1) = p and P(Z, = - 1 ) = q = 1 - p for all n. Then X,=X,-, +Z, n= 1,2,... and X, = k. The game ends when X, = 0 or X, = N. Thus, by Probs. 5.2 and 5.24, X(n) = (X,, n 2 0 ) is a Markov chain with state space E = (0, 1, 2,..., N), where states 0 and N are absorbing states. The Markov chain X(n) is also known as a simple random walk with absorbing barriers. Since the transition probability matrix P is p,,, = P(X,+, = OJX, = 0)= 1 pn.,, = P(X,+I = NIX, = N) = 1 For example, when p = q = 3 and N = 4, Conside :r a hom ogen.eous Markov chain X(n) = {X,, n 2 0) with a finite state spa ce E = (0, 1,..., N }, of which A = (0, 1,..., m), m 2 1, is a set of absorbing states and B = {rn + 1,..., N } is a set of nonabsorbing states. It is assumed that at least one of the absorbing states in A is accessible from any nonabsorbing states in B. Show that a.bsorption of X(n) in one or another of the absorbing states is certain. If X, E A, then there is nothing to prove, since X(n) is already absorbed. Let X, E B. By assumption, there is at least one state in A which is accessible from any state in B. Now assume that state k E A is accessible from j G B. Let njk (< co) be the smallest number n such that > 0. For a given state j, let nj be the largest of njk as k varies and n' be the largest of nj as j varies. After n' steps, no matter what the initial state of X(n), there is a probability p > 0 that X(n) is in an absorbing state. Therefore and 0 < 1 - p < 1. It follows by homogeneity and the Markov property that

200 192 RANDOM PROCESSES [CHAP 5 Now since lirn,,,(l - p), = 0, we have lim P{Xn E B) = 0 or lim P{Xn E B = A) = 1 n+ co n+ a, which shows that absorption of X(n) in one or another of the absorption states is certain Verify Eq. (5.50). Let X(n) = {X,, n 2 0) be a homogeneous Markov chain with a finite state space E = (0, 1,..., N}, of which A = (0, 1,..., m), m 2 1, is a set of absorbing states and B = {m + 1,..., N) is a set of nonabsorbing states. Let state k E B at the first step go to i E E with probability p,,. Then ukj = P{Xn = j (~ A) I X, = k (~ B)) Now Then Eq. (5.1 24) becomes But pkj, k = m + 1,..., N ; j = 1,..., m, are the elements of R, whereaspki, k = m + 1,..., N ; i = rn + 1,..., N are the elements of Q [see Eq. (5.49a)I. Hence, in matrix notation, Eq. (5.125) can be expressed as U=R+QU or (I-Q)U=R (5.126) Premultiplying both sides of the second equation of Eq. (5.126) with (I - Q)-', we obtain u=(i-q)-'r=@r Consider a simple random walk X(n) with absorbing barriers at state 0 and state N = 3 (see Prob. 5.38). (a) Find the transition probability matrix P. (b) Find the probabilities of absorption into states 0 and 3. (a) The transition probability matrix P is [Eq. (5.1 23)] (b) Rearranging the transition probability matrix P as [Eq. (5.49a)],

201 CHAP. 51 RANDOM PROCESSES and by Eq. (5.49b), the matrices Q and R are given by Then and By Eq. (5.50), Thus, the probabilities of absorption into state 0 from states 1 and 2 are given, respectively, by UIO = - and u2,=- q2 1 - P9 1-P9 and the probabilities of absorption into state 3 from states 1 and 2 are given, respectively, by Note that p2 UI3 = - P and u,, = - 1-P9 1 - P9 which confirm the proposition of Prob Consider the simple random walk X(n) with absorbing barriers at 0 and 3 (Prob. 5.41). Find the expected time (or steps) to absorption when X, = 1 and when X, = 2. The fundamental of X(n) is [Eq. (5.1 27)] Let be the time to absorption when X, = i. Then by Eq. (5.51), we get Consider the gambler's game described in Prob What is the probability of A's losing all his money? Let P(k), k = 0, 1, 2,..., N, denote the probability that A loses all his money when his initial capital is k dollars. Equivalently, P(k) is the probability of absorption at state 0 when X, = k in the simple random

202 RANDOM PROCESSES [CHAP 5 walk X(n) with absorbing barriers at states 0 and N. Now if 0 < k < N, then P(k)=pP(k+ l)+qp(k- 1) k = 1, 2,..., N - 1 where pp(k + 1) is the probability that A wins the first round and subsequently loses all his money and qp(k - 1) is the probability that A loses the first round and subsequently loses all his money. Rewriting Eq. (5.130), we have which is a second-order homogeneous linear constant-coefficient difference equation. Next, we have P(0) = 1 and P(N) = 0 (5.1 32) since if k = 0, absorption at 0 is a sure event, and if k = N, absorption at N has occurred and absorption at 0 is impossible. Thus, finding P(k) reduces to solving Eq. (5.131) subject to the boundary conditions given by Eq. (5.132). Let P(k) = r". Then Eq. (5.131) becomes Setting k = 1 (and noting that p + q = I), we get from which we get r = 1 and r = q/p. Thus, where c, and c, are arbitrary constants. Now, by Eq. (5.132), Solving for c, and c,, we obtain Hence Note that if N 9 k, Setting r = q/p in Eq. (5.134), we have Thus, whenp = q = 3, Show that Eq. (5.1 34) is consistent with Eq. (5.1 28).

203 CHAP. 51 RANDOM PROCESSES Substituting k = 1 and N = 3 in Eq. (5.134), and noting that p + q = 1, we have Now from Eq. (5.1 28), we have Consider the simple random walk X(n) with state space E = (0, 1, 2,..., N), where 0 and N are absorbing states (Prob. 5.38). Let r.v. T, denote the time (or number of steps) to absorption of X(n) when X, = k, k = 0, 1,..., N. Find E(T,). Let Y(k) = E(G). Clearly, if k = 0 or k = N, then absorption is immediate, and we have Y(0) = Y(N) = 0 Let the probability that absorption takes m steps when X, = k be defined by P(k, m) = P(T, = m) m = 1, 2,... Then, we have (Fig. 5-12) a, Q) Q) and Y(k) = E(T,) = 2 mp(k, m) = p x mp(k + 1, m - 1) + q C mp(k - 1, m - 1) m= 1 m= 1 m= 1 Setting m - 1 = i, we get al k n Fig Simple random walk with absorbing barriers.

204 196 RANDOM PROCESSES [CHAP 5 Now by the result of Prob. 5.39, we see that absorption is certain; therefore ' Thus Y(k) = py(k + 1) + qy(k - 1 ) + p + q Rewriting Eq. (5.1 do), we have Thus, finding P(k) reduces to solving Eq. (5.141) subject to the boundary conditions given by Eq. (5.137). Let the general solution of Eq. (5.141) be where &(k) is the homogeneous solution satisfying and Y,(k) is the particular solution satisfying Yp(k + 1 ) - - Yp(k) + - Y,(k - 1) = - - P P P Let Y,(k) = ak, where a is a constant. Then Eq. (5.143) becomes from which we get a = l/(q - p) and (k + 1)a - - ka + - (k - 1)a = - - P P P k Y,(k) = - 4-P P Z 4 Since Eq. (5.142) is the same as Eq. (5.131), by Eq. (5.133), we obtain where c, and c2 are arbitrary constants. Hence, the general solution of Eq. (5.141) is Now, by Eq. (5.137), Y(0) = 0 -q + c, =o Solving for c, and c,, we obtain Substituting these values in Eq. (5.146), we obtain (for p # q) When p = q = 4, we have Y(k) = E(T,) = k(n - k) p = q = 4

205 CHAP. 51 RANDOM PROCESSES Consider a Markov chain with two states and transition probability matrix (a) Find the stationary distribution fi of the chain. (b) Find limn,, Pn. (a) By definition (5.52), pp = p which yields p, = p,. Since p, + p, = 1, we obtain P = c3 41 (b) NOW pn = and lim,,, Pn does not exist Consider a Markov chain with two states and transition probability matrix (a) Find the stationary distribution fi of the chain. (b) Find limn,, Pn. (c) Find limn,, Pn by first evaluating Pn. (a) By definition (5.52); we have or which yields pp = p PI f 3 ~ = 2 PI $PI + 4 ~ = 2 P2 Each of these equations is equivalent to p, = 2p2. Since p, + p, = 1, we obtain (b) Since the Markov chain is regular, by Eq. (5.53), we obtain

206 RANDOM PROCESSES [CHAP 5 (c) Setting a = a and b = in Eq. (5.120) (Prob. 5.30), we get Since limn,, (b)" = 0, we obtain lim Pn = lim n-co POISSON PROCESSES Let T, denote the arrival time of the nth customer at a service station. Let 2, denote the time interval between the arrival of the nth customer and the (n - 1)st customer; that is, Z,=T,-T,-, n r l (5.149) and To = 0. Let (X(t), t 2 0) be the counting process associated with {T,, n 2 0). Show that if X(t) has stationary increments, then Z,, n = 1,2,..., are identically distributed r.v.3. We have By Eq. (5.149), P(Z,>z)= P(T,- T,-, > z)= P(T,> T,-, +z) Suppose that the observed value of T,-, is t,-,. The event (T, > T,-, + z ( T,-, = tn-,) occurs if and only if X(t) does not change count during the time interval (tn-,, tn-, + z) (Fig. 5-13). Thus, Since X(t) has stationary increments, the probability on the right-hand side of Eq. (5.150) is a function only of the time difference z. Thus which shows that the conditional distribution function on the left-hand side of Eq. (5.1 51) is independent of the particular value of n in this case, and hence we have F,,(z) = P(Zn < Z ) = 1 - P[X(Z) = 01 (5.1 52) which shows that the cdf of Zn is independent of n. Thus we conclude that the 2,'s are identically distributed r.v.'s. 0 l ( t Fig Show that Definition implies Definition Let pn(t) = P[X(t) = n]. Then, by condition 2 of Definition 5.6.2, we have p,(t + At) = P[X(t + At) = 01 = P[X(t) = 0, X(t + At) - X(0) = 01 = P[X(t) = 01 P[X(t + At) - X(t) = 01

207 CHAP. 51 RANDOM PROCESSES Now, by Eq. (5.59), we have Thus, P[X(t + At) - X(t) = 01 = 1 - r3. At + o(at) po(t + At) = po(t)[l - I At + o(at)] Letting At + 0, and by Eq. (5.58), we obtain Solving the above differential equation, we get ~b(t) = - IP&) po(t) = ke-" where k is an integration constant. Since po(0) = P[X(O) = 01 = 1, we obtain Similarly, for n > 0, po(t) = e - At pn(t + At) = P[X(t + At) = n] = P[X(t) = n, X(t + At) - X(0) = 01 n + P[X(t) = n - 1, X(t + At) - X(0) = 1) + P[X(t) = n - k, X(t +At) - X(0) = k] Now, by condition 4 of Definition 5.6.2, the last term in the above expression is o(at). Thus, by conditions 2 and 3 of Definition 5.6.2, we have Thus p,(t + At) = pn(t)[l - 1 At + o(at)] + p,-,(t)[i At + o(at)] k=2 and letting At -, 0 yields Multiplying both sides by e", we get PX~) + I P, = ~ Lpn - 1 (t) Hence Then by Eq. (5.154), we have d - [eapn(t)] = IeAtpn -,(t) dt or pl(t) = (At + ~)e-'~ where c is an integration constant. Since p,(o) = P[X(O) = 1) = 0, we obtain p,(t) = Ate-*' To show that we use mathematical induction. Assume that it is true for n - 1 ; that is,

208 RANDOM PROCESSES [CHAP 5 Substituting the above expression into Eq. (5.156), we have Integrating, we get Since pn(0) = 0, c, = 0, and we obtain which is Eq. (5.55) of Definition Thus we conclude that Definition implies Definition Verify Eq. (5.59). We note first that X(t) can assume only nonnegative integer values; therefore, the same is true for the counting increment X(t + At) - X(t). Thus, summing over all possible values of the increment, we get 00 1 P[X(t + At) - X(t) = k] = P[X(t + At) - X(t) = 01 k = 0 + P[X(t + At) - X(t) = 11 + P[X(t + At) - X(t) 2 21 = 1 Substituting conditions 3 and 4 of Definition into the above equation, we obtain P[X(t + At) - X(t) = 0) = 1 - A At + o(at) (a) Using the Poison probability distribution in Eq. (5.158), obtain an analytical expression for the correction term o(at) in the expression (condition 3 of Definition 5.6.2) (b) P[X(t + At) - X(t) = 11 = A At + o(at) (5.1 59) Show that this correction term does have the property of Eq. (5.58); that is, lim - o(at) - 0 at-o At (a) Since the Poisson process X(t) has stationary increments, Eq. (5.159) can be rewritten as Using Eq. (5.1 58) [or Eq. (5.1 57)], we have P[X(At) = 11 = p,(at) = A At + o(at) (5.1 60) pl(at) = L At e-at = A At( ) = L At + A At(eVAAt - 1) Equating the above expression with Eq. (5.160), we get (b) from which we obtain From Eq. (5.161), we have o(at) = A At(e-"t - 1) - lim lim - - A AWUt - 1) = lim *((?-at - 1) = 0 At-0 At At-0 At At Find the autocorrelation function R,(t, s) and the autocovariance function K,(t, s) of a Poisson process X(t) with rate 1.

209 CHAP. 51 RANDOM PROCESSES From Eqs. (5.56) and (5.57), E [X(t)] = It Var [X(t)] = It Now, the Poisson process X(t) is a random process with stationary independent increments and X(0) = 0. Thus, by Eq. (5.103) (Prob. 5.23), we obtain Kx(t, s) = o12 min(t, s) = I min(t, s) (5.1 62) since a12 = Var[X(l)] = I. Next, since E[X(t)]E[X(s)] = A2ts, by Eq. (5.10), we obtain Rx(t, s) = I min(t, s) + 12ts (5.1 63) Show that the time intervals between successive events (or interarrival times) in a Poisson process X(t) with rate 1 are independent and identically distributed exponential r.v.'s with parameter A. Let Z,, Z,,... be the r.v.'s representing the lengths of interarrival times in the Poisson process X(t). First, notice that {Z, > t) takes place if and only if no event of the Poisson process occur in the interval (0, t), and thus by Eq. (5.154), Hence Z, is an exponential r.v. with parameter I [Eq. (2.49)]. Let f,(t) be the pdf of Z,. Then we have which indicates that Z, is also an exponential r.v. with parameter I and is independent of Z,. Repeating the same argument, we conclude that Z,, Z,,... are iid exponential r.v.'s with parameter I. Let T,, denote the time of the nth event of a Poisson process X(t) with rate A. Show that T, is a gamma r.v. with parameters (n, 1). Clearly, where Z,, n = 1, 2,..., are the interarrival times defined by Eq. (5.149). From Prob. 5.53, we know that Z, are iid exponential r.v.'s with parameter I. Now, using the result of Prob. 4.33, we see that T, is a gamma r.v. with parameters (n, A), and its pdf is given by [Eq. (2.76)] : The random process {T,, n 2 1) is often called an arrival process. Suppose t is not a point at which an event occurs in a Poisson process X(t) with rate A. Let W(t) be the r.v. representing the time until the next occurrence of an event. Show that the distribution of W(t) is independent of t and W(t) is an exponential r.v. with parameter A. Let s (0 2 s < t) be the point at which the last event [say the (n - 1)st event] occurred (Fig. 5-14). The event (W(t) > 2) is equivalent to the event

210 RANDOM PROCESSES [CHAP 5 Thus, using Eq. (5.1 64), we have 0 tl 5 Y c t,t II H-- W(t) -I Fig P[W(t)>z]=P(Z,>t-s+zIZ,>t-s) P(Zn>t-s+z) e-a(t-s+r) - e-lr - - P(Zn>t-s) e-a(t-s) - and P[W(t) 2 z] = 1 - ear (5.1 66) which indicates that W(t) is an exponential r.v. with parameter I and is independent oft. Note that W(t) is often called a waiting time Patients arrive at the doctor's office according to a Poisson process with rate 1 = & minute. The doctor will not see a patient until at least three patients are in the waiting room. Find the expected waiting time until the first patient is admitted to see the doctor. What is the probability that nobody is admitted to see the doctor in the first hour? Let T, denote the arrival time of the nth patient at the doctor's office. Then T,=Z1 +z, +-+zn where Z,, n = 1,2,..., are iid exponential r.v.'s with parameter I = &. By Eqs. (4.108) and (2.50), cll-l t The expected waiting time until the first patient is admitted to see the doctor is E(T,) = 3(10) = 30 minutes Let X(t) be the Poisson process with parameter I = &. The probability that nobody is admitted to see the doctor in the first hour is the same as the probability that at most two patients arrive in the first 60 minutes. Thus, by Eq. (5.53, P[X(60) - X(0) I 21 = P[X(60) - X(0) = 01 + P[X(60) - X(0) = 11 + P[X(60) - X(0) = 21 - e-60/10 + e-60/~o = e-6( ) x (rn) 60 + e-60/10+(g)2 T, denote the time of the nth event of a Poisson process X(t) with rate A. Suppose that one event has occurred in the interval (0, t). Show that the conditional distribution of arrival time T, is uniform over (0, t). For z I t, which indicates that T, is uniform over (0, t) [see Eq. (2.45)].

211 CHAP. 53 RANDOM PROCESSES Consider a Poisson process X(t) with rate A, and suppose that each time an event occurs, it is classified as either a type 1 or a type 2 event. Suppose further that the event is classified as a type 1 event with probability p and a type 2 event with probability 1 - p. Let X,(t) and X,(t) denote the number of type 1 and type 2 events, respectively, occurring in (0, t). Show that (X,(t), t 2 0) and {X,(t), t 2 0) are both Poisson processes with rates Ap and A(1 - p), respectively. Furthermore, the two processes are independent. We have First we calculate the joint probability PIXl(t) = k, X2(t) = m]. Note that Thus, using Eq. (5.1 58), we obtain P[X,(t) = k, X2(t) = m 1 X(t) = n] = 0 when n # k + m Now, given that k + m events occurred, since each event has probability p of being a type 1 event and probability 1 - p of being a type 2 event, it follows that Thus, Then which indicates that X,(t) is a Poisson process with rate Ap. Similarly, we can obtain and so X2(t) is a Poisson process with rate A(l - p). Finally, from Eqs. (5.170), (5.171), and (5.169), we see that Hence, Xl(t) and X2(t) are independent.

212 RANDOM PROCESSES [CHAP 5 WIENER PROCESSES Let X,,..., X, be jointly normal r.v.'s. Show that the joint characteristic function of XI,..., X, is given by 1 Y,,... xn(ol,..., on) = exp wi pi - - oi a, o, i= 1 2 i=l k=l where pi = E(Xi) and aik = Cov(Xi, X,). Let Y =alx, + a2x2 + - a. + By definition (4.50), the characteristic function of Y is anxn Now, by the results of Prob. 4.55, we see that Y is a normal r.v. with mean and variance given by [Eqs. (4.108) and (4.1 1 I)] Thus, by Eq. (4.125), Equating Eqs. (5.176) and (5.1 73) and setting o = 1, we get By replacing a,'s with mi's, we obtain Eq. (5.1 72); that is, 1 " " YXI... X,(ol,..., an) = exp mi pi -- C C ai cok cik i= 1 2 i =l,'=i Let Then we can write and Eq. (5.1 72) can be expressed more compactly as Let XI,..., X, be jointly normal r.v.'s Let where aik (i = 1,..., m; j = 1,..., n) are constants. Show that Y,,..., Y, are also jointly normal r.v.'s.

213 CHAP. 51 RANDOM PROCESSES Let X = Then Eq. (5.1 78) can be expressed as Y=AX Then the characteristic function for Y can be written as Since X is a normal random vector, by Eq. (5.1 77) we can write Thus Yx(ATm) = e~p[j(a~m)~p~ - 3(AT~)TKX(ATw)] = exp[ jotap, - $wtakx AT4 Yda,,..., a,) = exp(jwtpy - $mtky a ) where = K~ = A K ~ Comparing Eqs. (5.1 77) and (5.180), we see that Eq. (5.180) is the characteristic function of a random vector Y. Hence, we conclude that Y,,..., Ym are also jointly normal r.v.'s Note that on the basis of the above result, we can say that a random process {X(t), t E T) is a normal process if every finite linear combination of the r.v.'s X(ti), ti E T is normally distributed Show that a Wiener process X(t) is a normal process. Consider an arbitrary linear combination where 0 5 t, < < tn and ai are real constants. Now we write n aix(ti) = (a, a,)[x(tl) - X(O)] + (a, +. + a,)[x(t,) - X(tl)] i= 1 Now from conditions 1 and 2 of Definition 5.7.1, the right-hand side of Eq. (5.183) is a linear combination of independent normal r.v.3. Thus, based on the result of Prob. 5.60, the left-hand side of Eq. (5.183) is also a normal r.v.; that is, every finite linear combination of the r.v.'s X(ti) is a normal r.v. Thus we conclude that the Wiener process X(t) is a normal process A random process {X(t), t E T) is.said to be continuous in probability if for every 8 > 0 and t E T, lim P( ( X(t + h) - X(t) I > E ) = 0 h+o Show that a Wiener process X(t) is continuous in probability. From Chebyshev inequality (2.97), we have Var[X(t + h) - X(t)] P( I X(t + h) - X(t) I > E } I E > 0 &2

214 RANDOM PROCESSES [CHAP 5 Since X(t) has stationary increments, we have in view of Eq. (5.63). Hence, Var[X(t + h) - X(t)] = Var[X(h)] = a2h a2h lim P{ I X(t + h) - X(t) I > E ) = lim - = 0 h-0 h-ro gz Thus the Wiener process X(t) is continuous in probability. Supplementary Problems Consider a random process X(n) = {X,, n 2 l), where x,=z, +z2 + -+z, and Z, are iid r.v.'s with zero mean and variance a2. Is X(n) stationary? Ans. No Consider a random process X(t) defined by X(t) = Y cos(ot + 0 ) where Y and 0 are independent r.v.3 and are uniformly distributed over (-A, A) and (- K, K), respectively. (a) Find the mean of X(t). (b) Find the autocorrelation function Rx(t, s) of X(t). Ans. (a) E[X(t)] = 0; (b) Rx(t, s) = i~~ cos O(t - S) Suppose that a random process X(t) is wide-sense stationary with autocorrelation (a) Find the second moment of the r.v. X(5). (b) Find the second moment of the r.v. X(5) - X(3). R,(t, t + z) = e-1'112 Ans. (a) E[X~(~)] = 1 ; (b) E{[X(5) - x(3)i2) = 2(1 - e- ') Consider a random process X(t) defined by X(t) = U cos t + (V + 1) sin t where U and V are independent r.v.'s for which - co < t < cx, E(U) = E(V) = 0 E(UZ) = E(V2) = 1 (a) Find the autocovariance function Kx(t, s) of X(t). (b) Is X(t) WSS? Ans. (a) Kx(t, s) = cos(s - t); Consider the random processes (b) No. where A,, A,, a,, and w, are constants, and r.v.3 0 and 0 are independent and uniformly distributed over ( - w, 4.

215 CHAP. 51 RANDOM PROCESSES (a) Find the cross-correlation function of Rxy(t, t + z) of X(t) and Y(t). (b) Repeat (a) if 63 = 4. Ans. (a) Rxy(t, t + z)] = 0 A04 (b) Rxy(t, t + 2) =- 2 cos[(ol - u,)t + o,z] Given a Markov chain {X,, n 2 01, find the joint pmf Hint: Use Eq. (5.32). P(X, = i,, X1 = i,,..., X, = in) Let {X,, n 2 0) be a homogeneous Markov chain. Show that P(Xn+,=kl,..., X,+,=k,IX,=i,,..., X,=i)=P(X, =k,,..., X,=k,IX,=i) Hint: Use the Markov property (5.27) and the homogeneity property. Verify Eq. (5.37). Hint: Write Eq. (5.39) in terms of components. Find Pn for the following transition probability matrices: A certain product is made by two companies, A and B, that control the entire market. Currently, A and B have 60 percent and 40 percent, respectively, of the total market. Each year, A loses 5 of its market share to By while B loses 3 of its share to A. Find the relative proportion of the market that each hold after 2 years. Ans. A has 43.3 percent and B has 56.7 percent. Consider a Markov chain with state (0, 1, 2) and transition probability matrix Is state 0 periodic? Hint: Draw the state transition diagram. Ans. No Verify Eq. (5.51).

216 RANDOM PROCESSES [CHAP 5 Hint: Let = [Njk], where Njk is the number of times the state k ( B) ~ is occupied until absorption takes place when X(n) starts in state j ( B). ~ Then 7;. = ~ ~=,+, Njk; calculate E(Njk) Consider a Markov chain with transition probability matrix Find the steady-state probabilities. Ans. p = [$ $ $ Let X(t) be a Poisson process with rate A. Find E[X2(t)]. Ans. At + A2t Let X(t) be a Poisson process with rate 1. Find E([X(t) - X(s)I2) for t > s. Hint: Use the independent stationary increments condition and the result of Prob Ans. A(t - s) + A2(t - s ) ~ Let X(t) be a Poisson process with rate A. Find P[X(t -d)= kix(t)=j] d >O Ans. j! (tid)k(:$-k k!(j - k)! Let T, denote the time of the nth event of a Poisson process with rate A. Find the variance of T,. Ans. n/a Assume that customers arrive at a bank in accordance with a Poisson process with rate 1 = 6 per hour, and suppose that each customer is a man with probability 4 and a woman with probability 5. Now suppose that 10 men arrived in the first 2 hours. How many woman would you expect to have arrived in the first 2 hours? Ans Let X,,..., X, be jointly normal r.v.'s. Let 5 =Xi + ci i = 1,..., n where ci are constants. Show that Y,,..., Y, are also jointly normal r.v.'s. Hint: See Prob Derive Eq. (5.63). Hint: Use condition (1) of a Wiener process and Eq. (5.1 02) of Prob

217 Chapter 6 Analysis and Processing of Random Processes 6.1 INTRODUCTION In this chapter, we introduce the methods for analysis and processing of random processes. First, we introduce the definitions of stochastic continuity, stochastic derivatives, and stochastic integrals of random processes. Next, the notion of power spectral density is introduced. This concept enables us to study wide-sense stationary processes in the frequency domain and define a white noise process. The response of linear systems to random processes is then studied. Finally, orthogonal and spectral representations of random processes are presented. 6.2 CONTINUITY, DIFFERENTIATION, INTEGRATION In this section, we shall consider only the continuous-time random processes. A. Stochastic Continuity: A random process X(t) is said to be continuous in mean square or mean square (m.s.) continuous if lim E{[X(t + E) - X(t)I2) = The random process X(t) is m.s. continuous if and only if its autocorrelation function is continuous (Prob. 6.1). 1fx(t) is WSS, then it is mas. continuous if and only if its autocorrelation function Rx(r) is continuous at z = 0. If X(t) is m.s. continuous, then its mean is continuous; that is, which can be written as lirn p,(t + E) = px(t) E+O lirn E[X(t + E)] = E[lim X(t + E)] 8-0 &-'O Hence, if X(t) is m.s. continuous, then we may interchange the ordering of the operations of expectation and limiting. Note that m.s. continuity of X(t) does not imply that the sample functions of X(t) are continuous. For instance, the Poisson process is m.s. continuous (Prob. 6.46), but sample functions of the Poisson process have a countably infinite number of discontinuities (see Fig. 5-2). (6.1) (6.2) B. Stochastic Derivatives: A random process X(t) is said to have a m.s. derivative X1(t) if where 1.i.m. denotes limit in the mean (square); that is,

218 210 ANALYSIS AND PROCESSING OF RANDOM PROCESSES [CHAP 6 The m.s. derivative of X(t) exists if a2~,(t, s)/& as exists (Prob. 6.6). If X(t) has the m.s. derivative X1(t), then its mean and autocorrelation function are given by Equation (6.6) indicates that the operations of differentiation and expectation may be interchanged. If X(t) is a normal random process for which the m.s. derivative X'(t) exists, then X'(t) is also a normal random process (Prob. 6.10). C. Stochastic Integrals: A m.s. integral of a random process X(t) is defined by whereto < t, < tand At, = ti+l -ti. The m.s. integral of X(t) exists if the following integral exists (Prob. 6.11): This implies that if X(t) is m.s. continuous, then its m.s. integral Y(t) exists (see Prob. 6.1). The mean and the autocorrelation function of Y(t) are given by Equation (6.1 0) indicates that the operations of integration and expectation may be interchanged. If X(t) is a normal random process, then its integral Y(t) is also a normal random process. This follows from the fact that Z, X(ti) Ati is a linear combination of the jointly normal r.v.'s. (see Prob. 5.60). 6.3 POWER SPECTRAL DENSITIES In this section we assume that all random processes are WSS. A. Autocorrelation Functions: The autocorrelation function of a continuous-time random process X(t) is defined as [Eq. (5.7)] Properties of RAT):

219 CHAP. 61 ANALYSIS AND PROCESSING OF RANDOM PROCESSES 21 1 Property 3 [Eq. (6.15)J is easily obtained by setting z = 0 in Eq. (6.12). If we assume that X(t) is a voltage waveform across a 1-Q resistor, then E[X2(t)] is the average value of power delivered to the 1-Q resistor by X(t). Thus, E[x~(~)] is often called the average power of X(t). Properties 1 and 2 are verified in Prob In case of a discrete-time random process X(n), the autocorrelation function of X(n) is defined by Rx(k) = E[X(n)X(n + k)] (6.1 6) Various properties of Rx(k) similar to those of RX(z) can be obtained by replacing z by k in Eqs. (6.13) to (6.15). B. Cross-Correlation Functions The cross-correlation function of two continuous-time jointly WSS random processes X(t) and Y(t) is defined by Properties of RAT) : These properties are verified in Prob Two processes X(t) and Y(t) are called (mutually) orthogonal if RXy(z) = 0 for all z (6.21) Similarly, the cross-correlation function of two discrete-time jointly WSS random processes X(n) and Y(n) is defined by Rxy(k) = E[X(n) Y(n + k)] (6.22) and various properties of Rxy(k) similar to those of RXy(z) can be obtained by replacing z by k in Eqs. (6.18) to (6.20). C. Power Spectral Density: The power spectral density (or power spectrum) Sx(o) of a continuous-time random process X(t) is defined as the Fourier transform of RX(z): Thus, taking the inverse Fourier transform of Sx(o), we obtain Equations (6.23) and (6.24) are known as the Wiener-Khinchin relations. Properties of SAo) : 1. SAo) is real and Sx(o) 2 0.

220 212 ANALYSIS AND PROCESSING OF RANDOM PROCESSES [CHAP 6 Similarly, the power spectral density Sx(Q) of a discrete-time random process X(n) is defined as the Fourier transform of Rx(k): Thus, taking the inverse Fourier transform of Sx(Q), we obtain Properties of S#): 1. Sx(Q + 271) = Sx(R) 2. Sx(Q) is real and Sx(Q) sx( - n) = s,(q Note that property 1 [Eq. (6.30)] follows from the fact that e-jm is periodic with period 271. Hence it is sufficient to define SAR) only in the range (-n, n). D. Cross Power Spectral Densities: The cross power spectral density (or cross power spectrum) Sxy(w) of two continuous-time random processes X(t) and Y(t) is defined as the Fourier transform of RXy(z): Thus, taking the inverse Fourier transform of Sxdo), we get Properties of S,(o) : Unlike Sx(o), which is a real-valued function of o, Sxy(o), in general, is a complex-valued function. Similarly, the cross power spectral density S,dQ) of two discrete-time random processes X(n) and Y(n) is defined as the Fourier transform of Rxy(k): Thus, taking the inverse Fourier transform of Sxy(R), we get

221 CHAP. 61 ANALYSIS AND PROCESSING OF RANDOM PROCESSES Properties of SAC&) : Unlike S,(SZ), which is a real-valued function of w, Sxy(Q), in general, is a complex-valued function. 6.4 WHITE NOISE A continuous-time white noise process, W(t), is a WSS zero-mean continuous-time random process whose autocorrelation function is given by where 6(2) is a unit impulse function (or Dirac 6 function) defined by is any function continuous at z = 0. Taking the Fourier transform of Eq. (6.43), we obtain which indicates that X(t) has a constant power spectral density (hence the name white noise). Note that the average power of W(t) is not finite. Similarly, a WSS zero-mean discrete-time random process W(n) is called a discrete-time white noise if its autocorrelation function is given by where S(k) is a unit impulse sequence (or unit sample sequence) defined by Taking the Fourier transform of Eq. (6.46), we obtain Again the power spectral density of W(n) is a constant. Note that Sw(R + 2n) = Sw(Q) and the average power of W(n) is o2 = Var[W(n)], which is finite. 6.5 RESPONSE OF LINEAR SYSTEMS TO RANDOM INPUTS A. Linear Systems: A system is a mathematical model of a physical process that relates the input (or excitation) signal x to the output (or response) signal y. Then the system is viewed as a transformation (or mapping) of x into y. This transformation is represented by the operator T as (Fig. 6-1) If x and y are continuous-time signals, then the system is called a continuous-time system, and if x and y are discrete-time signals, then the system is called a discrete-time system. If the operator T is a

222 ANALYSIS AND PROCESSING OF RANDOM PROCESSES [CHAP 6 linear operator satisfying -4TI-b System Fig. 6-1 T{xl + x,} = Tx, + Tx, = y, + y2 T{ax} = atx = ay (Additivity) (Homogeneity) where a is a scalar number, then the system represented by T is called a linear system. A system is called time-invariant if a time shift in the input signal causes the same time shift in the output signal. Thus, for a continuous-time system, for any value of to, and for a discrete-time system, for any integer no. For a continuous-time linear time-invariant (LTI) system, Eq. (6.49) can be expressed as is known as the impulse response of a continuous-time LTI system. The right-hand side of Eq. (6.50) is commonly called the convolution integral of h(t) and x(t), denoted by h(t) * x(t). For a discrete-time LTI system, Eq. (6.49) can be expressed as where is known as the impulse response (or unit sample response) of a discrete-time LTI system. The righthand side of Eq. (6.52) is commonly called the convolution sum of h(n) and x(n), denoted by h(n) * x(n). B. Response of a Continuous-Time Linear System to Random Input: When the input to a continuous-time linear system represented by Eq. (6.49) is a random process {X(t), t E T,}, then the output will also be a random process {Y(t), t E Ty); that is, For any input sample function xi(t), the corresponding output sample function is If the system is LTI, then by Eq. (6.50), we can write Y(t) = J:(l)X(t - 4 d i Note that Eq. (6.56) is a stochastic integral. Then

223 CHAP. 61 ANALYSIS AND PROCESSING OF RANDOM PROCESSES The autocorrelation function of Y(t) is given by (Prob. 6.24) If the input X(t) is WSS, then from Eq. (6.57), where H(0) = H(o)I,=, and H(o) is the frequency response of the system defined by the Fourier transform of h(t); that is, The autocorrelation function of Y(t) is, from Eq. (6.58), Setting s = t + z, we get From Eqs. (6.59) and (6.62), we see that the output Y(t) is also WSS. Taking the Fourier transform of Eq. (6.62), the power spectral density of Y(t) is given by (Prob. 6.25) Thus, we obtain the important result that the power spectral density of the output is the product of the power spectral density of the input and the magnitude squared of the frequency response of the system. When the autocorrelation function of the output Ry(z) is desired, it is easier to determine the power spectral density S,(o) and then evaluate the inverse Fourier transform (Prob. 6.26). Thus, By Eq. (6.15), the average power in the output Y(t) is C. Response of a Discrete-Time Linear System to Random Input: When the input to a discrete-time LTI system is a discrete-time random process X(n), then by Eq. (6.52), the output Y(n) is The autocorrelation function of Y(n) is given by

224 216 ANALYSIS AND PROCESSING OF RANDOM PROCESSES [CHAP 6 When X(n) is WSS, then from Eq. (6.66), where H(0) = H(Q)I,=, and H(Q) is the frequency response of the system defined by the Fourier transform of h(n) : The autocorrelation function of Y(n) is, from Eq. (6.67), Setting m = n + k, we get From Eqs. (6.68) and (6.71), we see that the output Y(n) is also WSS. Taking the Fourier transform of Eq. (6.71), the power spectral density of Y(n) is given by (Prob. 6.28) which is the same as Eq. (6.63). 6.6 FOURIER SERIES AND KARHUNEN-LOEVE EXPANSIONS A. Stochastic Periodicity : A continuous-time random process X(t) is said to be m.s. periodic with period T if E([X(t + T) - X(t)I2) = 0 (6.73) If X(t) is WSS, then X(t) is m.s. periodic if and only if its autocorrelation function is periodic with period T; that is, RX(z + T) = RX(z) (6.74) B. Fourier Series : Let X(t) be a WSS random process with periodic RX(z) having period T. Expanding RX(z) into a Fourier series, we obtain where Let T(t) be expressed as

225 CHAP. 61 ANALYSIS AND PROCESSING OF RANDOM PROCESSES where X, are r.v.'s given by Note that, in general, Xn are complex-valued r.v.'s. For complex-valued r.v.'s, the correlation between two r.v.'s X and Y is defined by E(XY*). Then 2(t) is called the m.s. Fourier series of X(t) such that (Prob. 6.34) Furthermore, we have (Prob. 6.33) C. Karhunen-Ldve Expansion Consider a random process X(t) which is not periodic. Let *(t) be expressed as where a set of functions {+,(t)) is orthonormal on an interval (0, T) such that and X, are r.v.'s given by Then %(t) is called the Karhunen-Lokve expansion of X(t) such that (Prob. 6.38) Let Rdt, s) be the autocorrelation function of X(t), and consider the following integral equation : where A, and 4,(t) are called the eigenvalues and the corresponding eigenfunctions of the integral equation (6.86). It is known from the theory of integral equations that if RJt, s) is continuous, then 4,(t) of Eq. (6.86) are orthonormal as in Eq. (6.83), and they satisfy the following identity: which is known as Mercer's theorem. With the above results, we can show that Eq. (6.85) is satisfied and the coeficient X, are orthogonal r.v.'s (Prob. 6.37); that is,

226 2 18 ANALYSIS AND PROCESSING OF RANDOM PROCESSES [CHAP FOURIER TRANSFORM OF RANDOM PROCESSES A. Continuous-Time Random Processes: The Fourier transform of a continuous-time random process X(t) is a random process x(.(o) given by X(W) = J_bX(ne-jmt dt (6.89) which is the stochastic integral, and the integral is interpreted as an m.s. limit; that is, Note that g(w) is a complex random process. Similarly, the inverse Fourier transform is also a stochastic integral and should also be interpreted in the m.s. sense. The properties of continuous-time Fourier transforms (Appendix B) also hold for random processes (or random signals). For instance, if Y(t) is the output of a continuous-time LTI system with input X(t), then where H(o) is the frequency response of the system. Let b dq, 03 be the two-dimensional Fourier transform of Rx(t, s); that is, Then the autocorrelation function of z(w) is given by (Prob. 6.41) If X(t) is a WSS random process with autocorrelation function Rx(t, s) = Rx(t - s) = R,(z) and power spectral density SAW), then (Prob. 6.42) Equation (6.99) shows that the Fourier transform of a WSS random process is nonstationary white noise. B. Discrete-Time Random Processes: The Fourier transform of a discrete-time random process X(n) is a random process X(0) given by (in m.s. sense)

227 CHAP. 63 ANALYSIS AND PROCESSING OF RANDOM PROCESSES Similarly, the inverse Fourier transform should also be interpreted in the m.s. sense. Note that z(q + 2n) = z(q) and the properties of discrete-time Fourier transforms (Appendix B) also hold for discrete-time random signals. For instance, if Y(n) is the output of a discrete-time LTI system with input X(n), then where H(i2) is the frequency response of the system. Let &al, Q,) be the two-dimensional Fourier transform of Rx(n, m): Then the autocorrelation function of R(Q) is given by (Prob. 6.44) If X(n) is a WSS random process with autocorrelation function Rx(n, m) = R,(n - m) = R,(k) and power spectral density Sx(Q), then Equation (6.106) shows that the Fourier transform of a discrete-time WSS random process is nonstationary white noise. Solved Problems CONTINUITY, DIFFERENTIATION, INTEGRATION 6.1. Show that the random process X(t) is m.s. continuous if and only if its autocorrelation function Rx(t, s) is continuous. We can write Thus, if Rx(t, s) is continuous, then lim E{[X(t + E) - X(t)I2) = lim {Rx(t + E, t + E) - 2Rx(t + E, t) + Rx(t, t)} = 0 E-0 &+O and X(t) is m-s. continuous. Next, consider Rx(t + El, t + E2) - RX(t, t) = E{[X(t + El) - X(t)][X(t + E2) - X(t)]) + E([X(t + 8,) - X(t)lX(t)) + E([X(t + 6,) - X(t)]X(t)) Applying Cauchy-Schwarz inequality (3.97) (Prob. 3.33, we obtain Rx(t + El, t + E2) - Rx(t, t) 2 (E{[X(t + E,) - X(t)12) E{[x(~ + c2) - ~(t)]~))"~ + ie(cx(t + 8,) - ~ (t)]~)~[x~(t)])~~~ + (E{[x(~ + - ~(t)]~}~[x~(t)])l/~ Thus if X(t) is m.s. continuous, then by Eq. (6.1) we have lim Rx(t + el, t + E2) - Rx(t, t) = o El. EZ-'~ that is, R,(t, s) is continuous. This completes the proof.

228 220 ANALYSIS AND PROCESSING OF RANDOM PROCESSES [CHAP Show that a WSS random process X(t) is m.s. continuous if and only if its autocorrelation function R,(z) is continous at z = 0. If X(t) is WSS, then Eq. (6.1 07) becomes Thus if RX(r) is continuous at z = 0, that is, lirn [RX(&) - RX(0)] = 0 &+O then lirn E([X(t + E) - X(t)I2) = 0 &+O that is, X(t) is m.s. continuous. Similarly, we can show that if X(t) is m.s. continuous, then by Eq. (6.108), RAT) is continuous at z = Show that if X(t) is m.s. continuous, then its mean is continuous; that is, Thus We have lirn px(t + E ) = px(t) c-+o Var[X(t + E) - X(t)] = E([X(t + E) - X(t)I2} - (E[X(t + E) - ~(t)])' 2 0 E([X(t + E) - X(t)I2) 2 (E[X(t + E) - X(t)])2 = [px(t + E) - px(t)12 If X(t) is m.s. continuous, then as E -+ 0, the left-hand side of the above expression approaches zero. Thus lirn [px(t + E )- px(t)] = 0 or lirn [p,(t + E) = pdt) c-0 E+O 6.4. Show that the Wiener process X(t) is m.s. continuous. From Eq. (5.64), the autocorrelation function of the Wiener process X(t) is given by Thus, we have Since lim max(cl, c2) = 0 El RAt, s) is continuous. Hence the Wiener process X(t) is m.s. continuous Show that every m.s. continuous random process is continuous in probability. A random process X(t) is continuous in probability if, for every t and a > 0 (see Prob. 5.62), lirn P{ I X(t + E) - X(t) I > a) = 0 e+o Applying Chebyshev inequality (2.97) (Prob. 2.37), we have Now, if X(t) is m.s. continuous, then the right-hand side goes to 0 as E -* 0, which implies that the left-hand side must also go to 0 as E Thus, we have proved that if X(t) is m.s. continuous, then it is also continuous in probability Show that a random process X(t) has a m.s. derivative X'(t) if a2~,(t, s)/at as exists at s = t.

229 CHAP. 61 ANALYSIS AND PROCESSING OF RANDOM PROCESSES Let By the Cauchy criterion (see the note at the end of this solution), the m.s. derivative X1(t) exists if lim E{[Y(t; E,) - Y(t; &,)I2) = 0 El, &2+0 Thus lim E[Y(~; c2)y(t; E,)] = El, ~2-0 provided d2rx(t, s)/at as exists at s = t. Setting E, = E, in Eq. (6.1 12), we get and by Eq. (6.1 1 I), we obtain lim E[Y2(t; E,)] = lim E[Y2(t; &,)I = R2 El -0 ~ 2-0 Thus, we conclude that X(t) has a m.s. derivative X1(t) if a2rx(t, s)/at ds exists at s = t. If X(t) is WSS, then the above conclusion is equivalent to the existence of a2 RX(z)/Z22 at 7 = 0. Note: In real analysis, a function g(&) of some parameter e converges to a finite value if This is known as the Cauchy criterion Suppose a random process X(t) has a m.s. derivative X'(t). (a) Find E[X'(t)]. (b) Find the cross-correlation function of X(t) and X1(t). (c) Find the autocorrelation function of X1(t). (u) We have = lim E c+o I

230 ANALYSIS AND PROCESSING OF RANDOM PROCESSES [CHAP 6 (b) From Eq. (6.17), the cross-correlation function of X(t) and X1(t) is (c) Using Eq. (6.1 13, the autocorrelation function of X'(t) is 6.8. If X(t) is a WSS random process and has a m.s. derivative X'(t), then show that For a WSS process X(t), Rx(t, s) = RX(s - t). Thus, setting s - t = z in Eq. (6.115) of Prob. 6.7, we obtain aras - t)/as = drx(z)/dz and Now arx(s - t)/at = -drx(z)/dz. Thus, a2rx(s - t)/at as = -d2~,(z)/dr2, and by Eq. (6.116) of Prob, 6.7, we have 6.9. Show that the Wiener process X(t) does not have a m.s. derivative. From Eq. (5.64), the autocorrelation function of the Wiener process X(t) is given by Thus where u(t - s) is a unit step function defined by and it is not continuous at s = t (Fig. 6-2). Thus d2r&, s)/& as does not exist at s = t, and the Wiener process X(t) does not have a m.s. derivative.

231 CHAP. 61 ANALYSIS AND PROCESSING OF RANDOM PROCESSES 0 s t Fig. 6-2 Shifted unit step function. Note that although a m.s. derivative does not exist for the Wiener process, we can define a generalized derivative of the Wiener process (see Prob. 6.20) Show that if X(t) is a normal random process for which the m.s. derivative Xt(t) exists, then Xf(t) is also a normal random process. Let X(t) be a normal random process. Now consider Then, n r.v.'s Y,(t,), I&),..., Y,(t,,) are given by a linear transformation of the jointly normal r.v.'s X(t,), X(t, + E), X(t2), X(t2 + E),..., X(t,), X(tn + E). It then follows by the result of Prob that Y,(t,), Y,(t,),..., Y,(tn) are jointly normal r.v.'s, and hence Y,(t) is a normal random process. Thus, we conclude that the m.s. derivative X'(t), which is the limit of Y,(t) as E + 0, is also a normal random process, since m.s. convergence implies convergence in probability (see Prob. 6.5) Show that the m.s. integral of a random process X(t) exists if the following integral exists: A m.s. integral of X(t) is defined by [Eq. (641 Again using the Cauchy criterion, the m.s. integral Y(t) of X(t) exists if As in the case of the m.s. derivative [Eq. (6.1 1 I)], expanding the square, we obtain E {[ i 1 X(ti) Ati - C X(tk) At, k 1'1 and Eq. (6.1 20') holds if lim Ati, Atr+ 0 i 1 R,(ti, t,) Ati Atk k

232 224 ANALYSIS AND PROCESSING OF RANDOM PROCESSES [CHAP 6 exists, or, equivalently, exists Let X(t) be the Wiener process with parameter a2. Let (a) Find the mean and the variance of Y(t). (b) Find the autocorrelation function of Y(t). (a) By assumption 3 of the Wiener process (Sec. 5.7), that is, E[X(t)] = 0, we have E[Y(t)] = E[[X(a) da] = [E[X(a)] da = 0 Then By Eq. (5.64), R,(a, fi) = a2 min(a, fl); thus, referring to Fig. 6-3, we obtain (b) Let t > s 2 0 and write Then, for t > s 2 0, Fig. 6-3

233 CHAP. 63 ANALYSIS AND PROCESSING OF RANDOM PROCESSES Now by Eq. (6.1 22), Using assumptions 1,3, and 4 of the Wiener process (Sec. 5.7), and since s I a I t, we have Finally, for 0 I B 5 s, Substituting these results into Eq. (6.1 23), we get Since R,(t, s) = RY(s, t), we obtain POWER SPECTRAL DENSITY Verify Eqs. (6.1 3) and (6.1 4). From Eq. (6.12), Setting t + T = s, we get Next, we have Expanding the square, we have or Thus E{[X(t) + X(t + z)i2) 2 0 E[X2(t) + 2X(t)X(t + z) + x2(r + T)] E[X(t)X(t + t)] + ~[Jr:'(t + T)] r 0 ~[x~(t)] from which we obtain Eq. (6.14); that is, 2Rx(0) +_ 2RX(z) Verify Eqs. (6.18) to (6.20).

234 ANALYSIS AND PROCESSING OF RANDOM PROCESSES [CHAP 6 By Eq. (6.1 7), Setting t - z = s, we get or Rxy(-r) = E[X(t)Y(t - T)] Rxu(-r) = E[X(s + ~)Y(s)] = E[Y(s)X(s + T)] = RYx(z) Next, from the Cauchy-Schwarz inequality, Eq. (3.97) (Prob. 3.35), it follows that {E[X(t)Y(t + T )]}~ 5 E[X2(t)]~[Y2(t + T)] from which we obtain Eq. (6.19); that is, CRx Az)l 5 Rx(O)R AO) I RXYW I 5 JKKKiW Now E{[X(t) - Y(t + z)i2) 2 0 Expanding the square, we have E[x~(~) - 2X(t) Y(t + z) + y2(t + z)] 2 0 or E[X2(t)] - 2E[X(t)Y(t + 2)] + E[Y2(t + z)] 2 0 Thus Rx(0) - 2Rxy(z) + Ry(0) 2 0 from which we obtain Eq. (6.20); that is, RXY(4 l 3CRx(O) + RY(0)I Two random processes X(t) and Y(t) are given by where A and o are constants and O is a uniform r.v. over (0, 274. Find the cross-correlation function of X(t) and Y(t) and verify Eq. (6.18). From Eq. (6.1 7), the cross-correlation function of X(t) and Y(t) is Similarly, A2 -- sin oz = RXy(z) 2 A2 = -- 2 sin wr = RYx(z) From Eqs. (6.1 25) and (6.1 26), we see that which verifies Eq. (6.1 8). A2 A2 Rxy(-2) = - sin a(-r) = -- sin oz = R,,(T) Show that the power spectrum of a (real) random process X(t) is real and verify Eq. (6.26).

235 CHAP. 6) ANALYSIS AND PROCESSING OF RANDOM PROCESSES From Eq. (6.23) and expanding the exponential, we have Since R,(-z) = RX(z), Rx(z) cos wz is an even function of z and RX(z) sin o z is an odd function of z, and hence the imaginary term in Eq. (6.127) vanishes and we obtain which indicates that Sx(o) is real. Since cos(-or) = cos(oz), it follows that (6.2 28) which indicates that the power spectrum of a real random process X(t) is an even function of frequency Consider the random process where X(t) is a Poisson process with rate A. Thus Y(t) starts at Y(0) = 1 and switches back and forth from + 1 to - 1 at random Poisson times q, as shown in Fig The process Y(t) is known as the semirandom telegraph signal because its initial value Y(0) = 1 is not random. (a) Find the mean of Y(t). (b) Find the autocorrelation function of Y(t). (a) We have Thus, using Eq. (5.59, we have 1 if X(t) is even Y(t) = - 1 if X(t) is odd P[Y(t) = 11 = P[X(t) = even integer] P[Y(t) = - 11 = P[X(t) = odd integer] Fig. 6-4 Semirandom telegraph signal.

236 ANALYSIS AND PROCESSING OF RANDOM PROCESSES [CHAP 6 Hence py(t) = E[Y(t)] = (1)P[Y(t) = 11 + (- I)P[Y(t) = - 11 = e-*'(cash It - sinh It) = epzat (b) Similarly, since Y(t)Y(t + z) = 1 if there are an even number of events in (t, t + z) for z > 0 and Y(t)Y(t + z) = - 1 if there are an odd number of events, then for t > 0 and t + z > 0, (Adn = (1) e-" W + (- 1) e-" - n even n! nodd n! which indicates that R,(t, t + z) = RY(z), and by Eq. (6.13), Note that since E[Y(t)] is not a constant, Y(t) is not WSS Consider the random process where Y(t) is the semirandom telegraph signal of Prob and A is a r.v. independent of Y(t) and takes on the values + 1 with equal probability. The process Z(t) is known as the random telegraph signal. (a) Show that Z(t) is WSS. (b) Find the power spectral density of Z(t). (a) Since E(A) = 0 and E(A2) = 1, the mean of Z(t) is and the autocorrelation of Z(t) is Thus, using Eq. (6.130), we obtain (b) Thus, we see that Z(t) is WSS. Taking the Fourier transform of Eq. (6.132) (see Appendix B), we see that the power spectrum of Z(t) is given by Let X(t) and Y(t) be both zero-mean and WSS random processes. Consider the random process Z(t) defined by (a) Determine the autocorrelation function and the power spectral density of Z(t), (i) if X(t) and Y(t) are jointly WSS; (ii) if X(t) and Y(t) are orthogonal. (b) Show that if X(t) and Y(t) are orthogonal, then the mean square of Z(t) is equal to the sum of the mean squares of X(t) and Y(t).

237 CHAP. 63 ANALYSIS AND PROCESSING OF RANDOM PROCESSES (b) (i) If X(t) and Y(t) are jointly WSS, then we have RzW = RXW + RXYW + R,x(z) + RY(4 where z = s - t. Taking the Fourier transform of the above expression, we obtain Sz(4 = Sxf4 + SXY(4 + SYX(4 + SY(W (ii) If X(t) and Y(t) are orthogonal [Eq. (6.21)], RXY(4 = R,,(z) = 0 Then Rz(d = Rx~) + R&) SZ(4 = SAo) + SY(4 Setting z = 0 in Eq. (6.134a), and using Eq. (6.15), we get E[Z2(t)] = E[X2(t)] + E[ Y2(t)] which indicates that the mean square of Z(t) is equal to the sum of the mean squares of X(t) and Y(t). WHITE NOISE Using the notion of generalized derivative, show that the generalized derivative X'(t) of the Wiener process X(t) is a white noise. From Eq. (5.64), and from Eq. (6.1 19) (Prob. 6.9), we have Rx(t, s) = a2 min(t, s) Now, using the 6 function, the generalized derivative of a unit step function u(t) is given by Applying the above relation to Eq. (6.135), we obtain which is, by Eq. (6.116) (Prob. 6.7), the autocorrelation function of the generalized derivative X'(t) of the Wiener process X(t); that is, where z = t -- s. Thus, by definition (6.43), we see that the generalized derivative X1(t) of the Wiener process X(t) is a white noise. Recall that the Wiener process is a normal process and its derivative is also normal (see Prob. 6.10). Hence, the generalized derivative X'(t) of the Wiener process is called white normal (or white gaussian) noise Let X(t) be a Poisson process with rate 1. Let Y(t) = X(t) - At Show that the generalized derivative Y'(t) of Y(t) is a white noise. Since Y(t) = X(t) - At, we have formally r(t) = xl(t) - n

238 ANALYSIS AND PROCESSING OF RANDOM PROCESSES [CHAP 6 Then Ry*(t, S) = E[Yf(t)Yf(s)] = E{[X1(t) - L][Xf(s) - I]) = E[Xf(t)Xr(s) - LX1(s) - IXr(t) + 12] = E [Xr(t)X1(s)] - LE[Xf(s)] - IEIX1(t)] + L2 Now, from Eqs. (5.56) and (5.60), we have Thus EIX1(t)] = A and E[Xr(s)] = 1 (6.1 41) and from Eqs. (6.7) and (6.137), Substituting Eq. (6.1 41) into Eq. (6.1 39), we obtain Substituting Eqs. (6.141) and (6.142) into Eq. (6.140), we get Hence we see that Y1(t) is a zero-mean WSS random process, and by definition (6.43), Y'(t) is a white noise with a2 = I. The process Y1(t) is known as the Poisson white noise Let X(t) be a white normal noise. Let (a) Find the autocorrelation function of Y(t). (b) Show that Y(t) is the Wiener process. (a) From Eq. (6.137) of Prob. 6.20, Thus, by Eq. (6.1 l), the autocorrelation function of Y(t) is (b) Comparing Eq. (6.145) and Eq. (5.64), we see that Y(t) has the same autocorrelation function as the Wiener process. In addition, Y(t) is normal, since X(t) is a normal process and Y(0) = 0. Thus, we conclude that Y(t) is the Wiener process Let Y(n) = X(n) + W(n), where X(n) = A (for all n) and A is a r.v. with zero mean and variance a:, and W(n) is a discrete-time white noise with average power a2. It is also assumed that X(n) and W(n) are independent. (a) Show that Y(n) is WSS.

239 CHAP. 61 ANALYSIS AND PROCESSING OF RANDOM PROCESSES (b) Find the power spectral density S,(Q (a) The mean of Y(n) is The autocorrelation function of Y(n) is of Y(n). E[Y(n)] = E[X(n)] + E[W(n)] = E(A) + E[W(n)] = 0 Ry(n, n + k) = E{[X(n) + W(n)][X(n + k) + W(n + k)]) Thus Y(n) is WSS. = E[X(n)x(n + k)] + ECX(n)]E[W(n + k)] + E[W(n)]E[X(n + k)] + E[W(n) W(n + k)] = E(A2) + Rw(k) = a,' + a26(k) = Ry(k) (6.1 46) (b) Taking the Fourier transform of Eq. (6.146), we obtain RESPONSE OF LINEAR SYSTEMS TO RANDOM INPUTS Derive Eq. (6.58). Using Eq. (6.56), we have Derive Eq. (6.63). From Eq. (6.62), we have Taking the Fourier transform of RAT), we obtain Letting r + a - #? = 1, we get A WSS random process X(t) with autocorrelation function

240 ANALYSIS AND PROCESSING OF RANDOM PROCESSES [CHAP 6 where a is a real positive constant, is applied to the input of an LTI system with impulse response h(t) = e-b'u(t) where b is a real positive constant. Find the autocorrelation function of the output Y(t) of the system. The frequency response H(o) of the system is The power spectral density of X(t) is By Eq. (6.63), the power spectral density of Y(t) is - (L) (a2 - b2)b 02 + a2 Taking the inverse Fourier transform of both sides of the above equation, we obtain R~(" = 1 (ae-bi~i- be-airi) - b2)b Verify Eq. (6.25), that is, the power spectral density of any WSS process X(t) is real and S,(o) 2 0. The realness of Sx(o) was shown in Prob Consider an ideal bandpass filter with frequency response (Fig. 6-5) 1 w,<iwl<02 H(o) = 0 otherwise with a random process X(t) as its input. From Eq. (6.63), it follows that the power spectral density Sy(o) of the output Y(t) equals Hence, from Eq. (6.27), we have sy(o) = {y4 a1 < I I < a 2 otherwise which indicates that the area of Sx(o) in any interval of o is nonnegative. This is possible only if S,(o) 2 0 for every o. -W2 *I 0 Wl Y W Fig. 6-5

241 CHAP. 61 ANALYSIS AND PROCESSING OF RANDOM PROCESSES 233 Verify Eq. (6.72). From Eq. (6.71), we have Taking the Fourier transform of Rdk), we obtain Letting k + i - 1 = n, we get The discrete-time system shown in Fig. 6-6 consists of one unit delay element and one scalar multiplier (a < 1). The input X(n) is discrete-time white noise with average power a2. Find the spectral density and average power of the output Y(n). Fig. 6-6 From Fig. 6-6, Y(n) and X(n) are related by Y(n) = ay(n - 1) + X(n) The impulse response h(n) of the system is defined by h(n) = ah(n - 1) + 6(n) Solving Eq. (6.149), we obtain h(n) = anu(n) where u(n) is the unit step sequence defined by Taking the Fourier transform of Eq. (6.150), we obtain

242 ANALYSIS AND PROCESSING OF RANDOM PROCESSES [CHAP 6 Now, by Eq. (6.48), and by Eq. (6.72), the power spectral density of Y(n) is Taking the inverse Fourier transform of Eq. (6.151), we obtain Thus, by Eq. (6.33), the average power of Y(n) is Let Y(t) be the output of an LTI system with impulse response h(t), when X(t) is applied as input. Show that (a) Using Eq. (6.56), we have (b) Similarly, Let Y(t) be the output of an LTI system with impulse response h(t) when a WSS random process X(t) is applied as input. Show that (a) If X(t) is WSS, then Eq. (6.152) of Prob becomes which indicates that Rxy(t, s) is a function of the time difference z = s - t only. Hence

243 CHAP. 61 ANALYSIS AND PROCESSING OF RANDOM PROCESSES Taking the Fourier transform of Eq. (6.1 57), we obtain (b) Similarly, if X(t) is WSS, then by Eq. (6.156), Eq. (6.153) becomes which indicates that Rdt, s) is a function of the time difference z = s - t only. Hence Taking the Fourier transform RAT), we obtain Note that from Eqs. (6.154) and (6.155), we obtain Eq. (6.63); that is, Consider a WSS process X(t) with autocorrelation function RAT) and power spectral density Sx(o). Let Xf(t) = dx(t)/dt. Show that (a) If X(t) is the input to a differentiator, then its output is Y(t) = X'(t). The frequency response of a differentiator is known as H(o) = jo. Then from Eq. (6.1 54), Taking the inverse Fourier transform of both sides, we obtain (b) From Eq. (6.155),

244 ANALYSIS AND PROCESSING OF RANDOM PROCESSES [CHAP 6 Again taking the inverse Fourier transform of both sides and using the result of part (a), we have (c) From Eq. (6.63), d d RX*(z) = -- RXx,(z) = -- dz dz' R~(z) Sxr(o) = ( H(o) (2Sx(o) = ( jo (2SX(o) = o2sx(w) Note that Eqs. (6.159) and (6.160) were proved in Prob. 6.8 by a different method. FOURIER SERIES AND KARHUNEN-LOEVE EXPANSIONS Verify Eqs. (6.80) and (6.81). From Eq. (6.78), Since X(t) is WSS, E[X(t)] = p,, and we have Again using Eq. (6.78), we have Now Letting t - s = z, and using Eq. (6.76), we obtain Thus Let z(t) be the Fourier series representation of X(t) shown in Eq. (6.77). Verify Eq. (6.79).

245 CHAP. 61 ANALYSIS AND PROCESSING OF RANDOM PROCESSES From Eq. (6.77), we have Now, by Eqs. (6.81) and (6.162), we have Using these results, finally we obtain since each sum above equals Rx(0) [see Eq. (6.75)] Let X(t) be m.s. periodic and represented by the Fourier series [Eq. (6.77)] Show that From Eq. (6.81), we have Setting z = 0 in Eq. (6.75), we obtain Equation (6.1 63) is known as Parseval's theorem for the Fourier series If a random process X(t) is represented by a Karhunen-Loeve expansion [Eq. (6.82)] and Xn's are orthogonal, show that 4,(t) must satisfy integral equation (6.86); that is, Consider

246 238 ANALYSIS AND PROCESSING OF RANDOM PROCESSES [CHAP 6 Then since Xn's are orthogonal; that is, E(Xm X,*) = 0 if m # n. But by Eq. (6.84), Thus, equating Eqs. (6.165) and (6.166), we obtain where An = E( I Xn 12) Let z(t) be the Karhunen-Lokve expansion of X(t) shown in Eq. (6.82). Verify Eq. (6.88). From Eqs. (6.166) and (6.86), we have Now by Eqs. (6.83), (6.84), and (6.1 67) we obtain Let z(t) be the Karhunen-Lobe expansion of X(t) shown in Eq. (6.82). Verify Eq. (6.85). From Eq. (6.82), we have Using Eqs. (6.1 67) and (6.168), we have since by Mercer's theorem [Eq. (6.87)]

247 CHAP. 61 ANALYSIS AND PROCESSING OF RANDOM PROCESSES and A, = E(IXJ2) = 1.; Find the Karhunen-Lokve expansion of the Wiener process X(t). From Eq. (5.64),. <. Substituting the above expression into Eq. (6.86), we obtain Differentiating Eq. (6.1 70) with respect to t, we get Differentiating Eq. (6.1 71) with respect to t again, we obtain A general solution of Eq. (6.1 72) is +,(t) = a, sin o,, t + bn cos con t con = a/fi In order to determine the values of a,, b,, and A, (or on), we need appropriate boundary conditions. From Eq. (6.170), we see that c$,(o) = 0. This implies that b, = 0. From Eq. (6.171), we see that &(T) = 0. This implies that a (2n- 1)n (n- %)n %I==-=- n = 1, 2,... A 2T T Therefore the eigenvalues are given by a2t2 I, = n = I., 2,... (n - &)'n2 The normalization requirement [Eq. ( implies that Thus, the eigenfunctions are given by a: T [(a, sin ant)' dt = = 2 4 I -+an = t = sin - ) t o < t < T and the Karhunen-Lokve expansion of the Wiener process X(t) is where X, are given by t = f,, n= 1 sin - i) i 0 c t < T and they are uncorrelated with variance i n. 8 xn = Fx(t) sin(. - i) t

248 240 ANALYSIS AND PROCESSING OF RANDOM PROCESSES [CHAP Find the Karhunen-Loeve expansion of the white normal (or white gaussian) noise W(t). From Eq. (6.43), Substituting the above expression into Eq. (6.86), we obtain or [by Eq. (6.44)] which indicates that all I, = o2 and &,(t) are arbitrary. Thus, any complete orthogonal set {4,(t)) with corresponding eigenvalues A, = o2 can be used in the Karhunen-LoCve expansion of the white gaussian noise. FOURIER TRANSFORM OF RANDOM PROCESSES Derive Eq. (6.94). From Eq. (6.89), Then in view of Eq. (6.93) Derive Eqs. (6.98) and (6.99). Since X(t) is WSS, by Eq. (6.93), and letting t - s = z, we have From the Fourier transform pair (Appendix B) nh(o), we have Next, from Eq. (6.94) and the above result, we obtain

249 CHAP. 61 ANALYSIS AND PROCESSING OF RANDOM PROCESSES Let z(o) be the Fourier transform of a random process X(t). If %(o) is a white noise with zero mean and autocorrelation function q(o,)6(co1 - a,), then show that X(t) is WSS with power spectral density q(o)/27r. By Eq. (6.91), Then ~[X(w)]ej'~~ do = 0 Assuming that X(t) is a complex random process, we have -- ~[~(o,)~*(o,)]ej(~~~-"~~) dw, dw2 -- 4n2 1 [ 1 " " oo mq(col)b(w, - 02)e'(w1t-w2G do, do, 1 r m which depends only on t - s = z. Hence, we conclude that X(t) is WSS. Setting t - s = z and o, = o in Eq. (6.1 78), we have in view of Eq. (6.24). Thus, we obtain Sx(o) = q(o)/2n Verify Eq. (6.104). By Eq. (6.100), m co Then R&, Q2) = E[X(Q,)~*(Q,)] = x ~ [X(n)X*(rn)]e-j("~~-~~~) in view of Eq. (6.1 03) Derive Eqs. (6.105) and (6.106). If X(n) is WSS, then Rx(n, m) = R,(n - m). By Eq. (6.103), and letting n - m = k, we have

250 ANALYSIS AND PROCESSING OF RANDOM PROCESSES [CHAP 6 From the Fourier transform pair (Appendix B) x(n) = rs(Q), we have Q) 1 e-jrn('l +n2) = 27cS(R, + a,) m=-w Hence &(%, n2) = 2&(%)6(fil + n2) Next, from Eq. (6.1 04) and the above result, we obtain MQ,, fi,) = Rx(fi,, 4,) = 2nsx(Q,)s(Q, - Q,) Supplementary Problems Is the Poisson process X(t) m.s. continuous? Hint: Use Eq. (5.60) and proceed as in Prob Ans. Yes Let X(t) be defined by (Prob. 5.4) X(t) = Y cos o t t 2 0 where Y is a uniform r.v. over (0, 1) and w is a constant. (a) Is X(t) m.s. continuous? (b) Does X(t) have a m.s. derivative? Hint: Use Eq. (5.87) of Prob Ans. (a) Yes; (b) yes Let Z(t) be the random telegraph signal of Prob (a) Is Z(t) m.s. continuous? (b) Does Z(t) have a m.s. derivative? Hint: Use Eq. (6.132) of Prob Ans. (a) Yes; (b) no Let X(t) be a WSS random process, and let X1(t) be its m.s. derivative. Show that EIX(t)X1(t)] = 0. Hint : Use Eqs. (6.1 3) [or (6.1 4)] and (6.1 17). 2 t+t/ Let at) = 7 1 X(a) da where X(t) is given by Prob with w = 2nlT. (a) Find the mean of Z(t). (b) Find the autocorrelation function of Z(t). Ans. 1 (a) -- sin ot 7t 4 (b) R,(t, s) = - sin a t sin us 3n2

251 CHAP. 61 ANALYSIS AND PROCESSING OF RANDOM PROCESSES Consider a WSS random process X(t) with E[X(t)] = p,. Let The process X(t) is said to be ergodic in the mean if Find E [(X(r )),I. Ans. px Let X(t) = A cos(o, t + 0), where A and w, are constants, O is a uniform r.v. over (-n, n) (Prob. 5.20). Find the power spectral density of X(t). A2a Ans. Sx(o) = T [6(o - o,) + 6(o + w,)] A random process Y(t) is defined by where A and o, are constants, O is a uniform r.v. over (-n, a), and X(t) is a zero-mean WSS random process with the autocorrelation function RX(z) and the power spectral density Sx(o). Furthermore, X(t) and O are independent. Show that Y(t) is WSS, and find the power spectral density of Y(t). A2 Ans. Sdo) = q [Sx(w - w,) + Sx(o + o,)] Consider a discrete-time random process defined by where a, and Ri are real constants and Oi are independent uniform r.v.'s over (- n, n). (a) Find the mean of X(n). (b) Find the autocorrelation function of X(n). Ans. (a) E[X(n)] = 0 I m (b) Rx(n, n + k) = 1 a: cos(ri k) 2,=I Consider a discrete-time WSS random process X(n) with the autocorrelation function Find the power spectral density of X(n) Ans. S R X( ) = cos Q Let X(t) and Y(t) be defined by Rx(k) = 10e-0.51k1 -n<q<n X(t)= U cosoot + V sin oot Y(t)= Vcos mot- Usino,t where o, is constant and U and V are independent r.v.'s both having zero mean and variance c2. (a) Find the cross-correlation function of X(t) and Y(t). (b) Find the cross power spectral density of X(t) and Y(t).

252 ANALYSIS AND PROCESSING OF RANDOM PROCESSES [CHAP 6 Ans. (a) R,,(t, t + z) = - a2 sin ooz (b) Sx,(w) - ja2n[6(o.- uo) - 6(0 + coo)] Verify Eqs. (6.36) and (6.37). Hint: Substitute Eq. (6.18) into Eq. (6.34) Let Y(t) = X(t) + W(t), where X(t) and W(t) are orthogonal and W(t) is a white noise specified by Eq. (6.43) or (6.45). Find the autocorrelation function of Y(t). Ans. Rdt, s) = Rx(t, s) + 026(t - S) A zero-mean WSS random process X(t) is called band-limited white noise if its spectral density is given by Find the autocorrelation function of X(t). No o, sin o, z Ans. RX(z) = ~ obz A WSS random process X(t) is applied to the input of an LTI system with impulse response h(t) = 3e-2'u(t). Find the mean value of Y(t) of the system if E[X(t)] = 2. Hint: Use Eq. (6.59). Ans The input X(t) to the RC filter shown in Fig. 6-7 is a white noise specified by Eq. (6.45). Find the rneansquare value of Y(t). Hint: Use Eqs. (6.64) and (6.65). Ans. 02/(2RC) Fig. 6-7 RC filter The input X(t) to a differentiator is the random telegraph signal of Prob (a) Determine the power spectral density of the differentiator output. (b) Find the mean-square value of the differentiator output. 41m2 Ans. (a) Sy(w) = - o2 + 4A2 (b) E[Y2(t)J = co Suppose that the input to the filter shown in Fig. 6-8 is a white noise specified by Eq. (6.45). Find the power spectral density of Y(t). Ans. Sy(o) = a2(1 + a2 + 2a cos o T)

253 CHAP. 61 ANALYSIS AND PROCESSING OF RANDOM PROCESSES Delay T I Fig Verify Eq. (6.67). Hint: Proceed as in Prob Suppose that the input to the discrete-time filter shown in Fig. 6-9 is a discrete-time white noise with average power a2. Find the power spectral density of Y(n). Ans. SY(Q) = oz(l + az + 2a cos R) delay Fig Using the Karhunen-Loeve expansion of the Wiener process, obtain the Karhunen-Lobe expansion of the white normal noise. Hint: Take the derivative of Eq. (6.1 75) of Prob where W, are independent normal r.v.'s with the same variance a' Let Y(t) = X(t) + W(t), where X(t) and W(t) are orthogonal and W(t) is a white noise specified by Eq. (6.43) or (6.45). Let $,(t) be the eigenfunctions of the integral equation (6.86) and 1, the corresponding eigenvalues. (a) Show that 4,(t) are also the eigenfunctions of the integral equation for the Karhunen-Loeve expansion of Y(t) with Ry(t, s). (b) Find the corresponding eigenvalues. Hint: Use the result of Prob Ans Suppose that (b) An + a2 where Xn are r.v.'s and o, is a constant. Find the Fourier transform of X(t). Ans. X(w) = C 2nXn6(w - no,) n

254 246 ANALYSIS AND PROCESSING OF RANDOM PROCESSES [CHAP Let %(o) be the Fourier transform of a continuous-time random process X(t). Find the mean of X(o). Ans. F[px(t)] = px(t)e-jot dt where p,(t) = E[X(t)] Let where E[X(n)] = 0 and E[X(n)X(k)] = an2 6(n - k). Find the mean and the autocorrelation function of W(a).

255 Chapter 7 Estimation Theory 7.1 INTRODUCTION In this chapter, we present a classical estimation theory. There are two basic types of estimation problems. In the first type, we are interested in estimating the parameters of one or more r.v.'s, and in the second type, we are interested in estimating the value of an inaccessible r.v. Y in terms of the observation of an accessible r.v. X. 7.2 PARAMETER ESTIMATION Let X be a r.v. with pdf f (x) and X,,..., X, a set of n independent r.v.'s each with pdf f (x). The set of r.v.'s (XI,..., X,) is called a random sample (or sample vector) of size n of X. Any real-valued function of a random sample s(xl,..., X,) is called a statistic. Let X be a r.v. with pdf f (x; 8) which depends on an unknown parameter 8. Let (XI,..., X,) be a random sample of X. In this case, the joint pdf of X,,..., X, is given by n f(x; 8) = f ( ~ l 7 xn; 8) = n/(xi; 8) i= 1 where x,,..., x, are the values of the observed data taken from the random sample. An estimator of 8 is any statistic s(x,,..., X,), denoted as (7.1) O = s(x,,.., Xn) (7.2) For a particular set of observations X, = x,,..., Xn = x,, the value of the estimator s(x,,..., x,) will be called an estimate of 8 and denoted by 8. Thus an estimator is a r.v. and an estimate is a particular realization of it. It is not necessary that an estimate of a parameter be one single value; instead, the estimate could be a range of values. Estimates which specify a single value are called point estimates, and estimates which specify a range of values are called interval estimates. 7.3 PROPERTIES OF POINT ESTIMATORS A. Unbiased Estimators: An estimator O = s(x,,..., Xn) is said to be an unbiased estimator of the parameter 8 if E(0) = 8 for all possible values of 8. If O is an unbiased estimator, then its mean square error is given by EL(@ - 8)2] = E{[0 - E(@)]~) = Var(0) That is, its mean square error equals its variance. B. Efficient Estimators: An estimator 0, is said to be a more eflcient estimator of the parameter 8 than the estimator O, if 1. 0, and 0, are both unbiased estimators of 8.

256 ESTIMATION THEORY [CHAP 7 The estimator OM, = s(x,,..., X,) is said to be a most eficient (or minimum variance) unbiased estimator of the parameter 8 if 1. It is an unbiased estimator of Var(O,,) 5 Var(O) for all O. C. Consistent Estimators: The estimator On of 8 based on a random sample of size n is said to be consistent if for any small E > 0, or equivalently, lim P(I0, - 81 < e) = 1 n+ m lim P(IOn-812e)=0 n+ m The following two conditions are sufficient to define consistency (Prob. 7.5): 1. lim E(O,) = 8 n+ m 2. lim Var(O,) = 0 n+ 00 (7.5) 7.4 MAXIMUM-LIKELIHOOD ESTIMATION Let f (x; 8) = f (x,,..., x,; 8) denote the joint pmf of the r.v.'s X,,..., X, when they are discrete, and let it be their joint pdf when they are continuous. Let L(8) = f(x; 8) = f(x,,..., x,; 8) (7.9) Now L(8) represents the likelihood that the values x,,..., x, will be observed when 8 is the true value of the parameter. Thus L(8) is often referred to as the likelihood function of the random sample. Let OM, = s(xl,..., x,) be the maximizing value of L(8); that is, L(0,,) Then the maximum-likelihood estimator of 0 is = max L(8) e OML = s(x1,..., x,) and OM, is the maximum-likelihood estimate of 8. Since L(8) is a product of either pmf s or pdf s, it will always be positive (for the range of possible value of 8). Thus In L(8) can always be defined, and in determining the maximizing value of 8, it is often useful to use the fact that L(8) and In L(8) have their maximum at the same value of 8. Hence, we may also obtain OM, by maximizing In L(8). 7.5 BAYES' ESTIMATION Suppose that the unknown parameter 8 is considered to be a r.v. having some fixed distribution or prior pdf f (8). Then f (x; 8) is now viewed as a conditional pdf and written as f (x 1 8), and we can express the joint pdf of the random sample (XI,..., X,) and 8 as

257 CHAP. 71 ESTIMATION THEORY and the marginal pdf of the sample is given by where R, is the range of the possible value of 8. The other conditional pdf, is referred to as the posterior pdf of 8. Thus the prior pdf f(9) represents our information about 8 prior to the observation of the outcomes of XI,..., X,, and the posterior pdf f (8 I x,,..., x,) represents our information about 8 after having observed the sample. The conditional mean of 8, defined by is called the Bayes' estimate of 8, and is called the Bayes' estimator of 13. OB = E(8 I XI,..., X,) 7.6 MEAN SQUARE ESTIMATION In this section, we deal with the second type of estimation problem-that is, estimating the value of an inaccessible r.v. Y in terms of the observation of an accessible r.v. X. In general, the estimator P of Y is given by a function of X, g(x). Then Y - P = Y - g(x) is called the estimation error, and there is a cost associated with this error, C[Y - g(x)]. We are interested in finding the function g(x) that minimizes this cost. When X and Y are continuous r.v.'s, the mean square (m.s.) error is often used as the cost function, It can be shown that the estimator of Y given by (Prob. 7.17), is the best estimator in the sense that the m.s. error defined by Eq. (7.1 7) is a minimum. 7.7 LINEAR MEAN SQUARE ESTIMATION Now consider the estimator P of Y given by P = g(x) = a x + b We would like to find the values of a and b such that the m.s. error defined by e = E[(Y - a2] = E([Y - (ax + b)i2} is minimum. We maintain that a and b must be such that (Prob. 7.20) and a and b are given by E{[Y - (ax + b)]x} = 0

258 250 ESTIMATION THEORY [CHAP 7 and the minimum m.s. error em is (Prob. 7.22) where a, = Cov(X, Y) and p,, is the correlation coefficient of X and Y. Note that Eq. (7.21) states that the optimum linear m.s. estimator p= ax + 6 of Y is such that the estimation error Y - P = Y - (ax + b) is orthogonal to the observation X. This is known as the orthogonality principle. The line y = ax + b is often called a regression line. Next, we consider the estimator? of Y with a linear combination of the random sample (XI,.-, Xn) by Again, we maintain that in order to produce the linear estimator with the minimum m.s. error, the coefficients ai must be such that the following orthogonality conditions are satisfied (Prob. 7.35): Solving Eq. (7.25) for ai, we obtain where "=E(YXj).=[:I a = ['I] an and R- ' is the inverse of R. R= Rij = E(XiXj) Solved Problems PROPERTIES OF POINT ESTIMATORS 7.1. Let (XI,..., Xn) be a random sample of X having unknown mean p. Show that the estimator of p defined by is an unbiased estimator of p. Note that X is known as the sample mean (Prob. 4.64). By Eq. (4.1 O8), Thus, M is an unbiased estimator of p Let (XI,..., X,) be a random sample of X having unknown mean p and variance a2. Show that the estimator of a2 defined by

259 CHAP. 71 ESTIMATION THEORY where R is the sample mean, is a biased estimator of a2. By definition, we have Now By Eqs. (4.1 12) and (7.27), we have Thus which shows that S2 is a biased estimator of c Let (XI,..., X,) be a random sample of a Poisson r.v. X with unknown parameter A. (a) Show that are both unbiased estimators of A. (b) Which estimator is more efficient? (a) By Eqs. (2.42) and (4.108), we have Al = Xi and A, = $(x, + X2) n i=l Thus, both estimators are unbiased estimators of I. (b) By Eqs. (2.43) and (4.1 12), Var(Al) = - Z Var(Xi) = - C Var(Xi) = - (ni) = - n2 i=l n2 i=l n2 n Thus, if n > 2, A, is a more efficient estimator of A than A,, since 1/n < Let (XI,..., X,) be a random sample of X with mean p and variance a2. A linear estimator of p is defined to be a linear function of X,,..., X,, l(x,,..., X,). Show that the linear estimator defined by [Eq. (7.27)],

260 is the most efficient linear unbiased estimator of p. Assume that ESTIMATION THEORY [CHAP 7 is a linear unbiased estimator of p with lower variance than M. Since MI is unbiased, we must have which implies that xy= '=, ai = 1. By Eq. (4.11 Z), By assumption, Var(M) = 1 02 and Var(M,) = o2 ai2 n i= 1 Consider the sum which, by assumption (7.30), is less than 0. This is impossible unless ai = l/n, implying that M is the most efficient linear unbiased estimator of p Showthatif then the estimator On is consistent. lim E(O,) = 8 and limvar(o,) = 0 n-rm n-+ a) Using Chebyshev's inequality (2.97), we can write Thus, if lim E(On) = 8 and lim Var(G3,) = 0 n- m n-rm then limp(i0, - 81 ~ E ) = O n-r m that is, On is consistent [see Eq. (7.6)] Let (XI,..., X,) be a random sample of a uniform r.v. X over (0, a), where a is unknown. Show that is a consistent estimator of the parameter a.

261 CHAP. 71 ESTIMATION THEORY 253 If X is uniformly distributed over (0, a), then from Eqs. (2.44), (2.45), and (4.98) of Prob. 4.30, the pdf of Z = max(x,,..., X,) is Thus and lim E(A) = a n+ w Next, and lim Var(A) = 0 n-* w Thus, by Eqs. (7.7) and (7.8), A is a consistent estimator of parameter a. MAXIMUM-LIKELIHOOD ESTIMATION 7.7. Let (XI,..., X,) be a random sample of a binomial r.v. X with parameters (m, p), where m is assumed to be known and p unknown. Determine the maximum-likelihood estimator of p. The likelihood function is given by [Eq. (2.36)] Taking the natural logarithm of the above expression, we get where and Setting dun LCp)]/dp = 0, the maximum-likelihood estimate jml of' p is obtained as Hence, the maximum-likelihood estimator of p is given by I n I 7.8. Let (XI,..., X,) be a random sample of a Poisson r.v. with unknown parameter A. Determine the maximum-likelihood estimator of A.

262 ESTIMATION THEORY [CHAP 7 The likelihood function is given by [Eq. (2.40)] Thus, where and Setting d[ln L(A)]/dA = 0, the maximum-likelihood estimate A, of 1 is obtained as A, = - xi n i=l Hence, the maximum-likelihood estimator of A is given by 7.9. Let (XI,..., X,) be a random sample of an exponential r.v. X with unknown parameter A. Determine the maximum-likelihood estimator of 1. The likelihood function is given by [Eq. (2.48)] Thus, and Setting d[ln L(R)]/dl = 0, the maximum-likelihood estimate i,, of 1 is obtained as Hence, the maximum-likelihood estimator of 1 is given by Let (XI,..., X,) be a random sample of a normal random r.v. X with unknown mean p and unknown variance a2. Determine the maximum-likelihood estimators of p and a2. The likelihood function is given by [Eq. (2.52)]

263 CHAP. 73 ESTIMATION THEORY Thus, In order to find the values of,u and a maximizing the above, we compute Equating these equations to zero, we get Solving for jml and &ML, the maximum-likelihood estimates of p and a2 are given, respectively, by Hence, the maximum-likelihood estimators of p and a2 are given, respectively, by BAYES' ESTIMATION Let (XI,..., X,) be the random sample of a Bernoulli r.v. X with pmf given by [Eq. (2.32)] f (x; P) = PX(l - P)' -" x=o, 1 (7.43) where p, 0 < p I 1, is unknown. Assume that p is a uniform r.v. over (0, 1). Find the Bayes' estimator of p. The prior pdf of p is the uniform pdf; that is, The posterior pdf of p is given by Then, by Eq. (7.1 2), where m = z;=, xi, and by Eq. (7.13), Now, from calculus, for integers m and k, we have m! k! [pm(l - dp = -- (m + k $- I)!

264 I) 256 ESTIMATION THEORY [CHAP 7 Thus, by Eq. (7.14), the posterior pdf of p is and by Eqs. (7.15) and (7.44), - (n + 1)' py-m m!(n - m)! dp Hence, by Eq. (7.1 6), the Bayes' estimator of p is PB = E(pIX,,..., X,) =- 1 n + 2 (:*xi Let (XI,..., X,) be a random sample of an exponential r.v. X with unknown parameter A. Assume that 3, is itself to be an exponential rev. with parameter a. Find the Bayes' estimator of 1. Now The assumed prior pdf of 1 is [Eq. (2.48)J f ( 4 = i0 ae -"" where m = El=, xi. Then, by Eqs. (7.12) and (7.13), a,l>o otherwise By Eq. (7.14), the posterior pdf of 1 is given by Thus, by Eq. (7.15), the Bayes' estimate of 1 is AB = E(AI xl,..., x,) = i' lf(alxl,..., x,) dl

265 CHAP. 71 ESTIMATION THEORY and the Bayes' estimator of rz is Let (XI,..., X,) be a random sample of a normal r.v. X with unknown mean p and variance 1. Assume that p is itself to be a normal r.v. with mean 0 and variance 1. Find the Bayes' estimator of p. The assumed prior pdf of p is Then by Eq. (7.12), Then, by Eq. (7.14), the posterior pdf of p is given by where C = C(xl,..., xn) is independent of p. However, Eq. (7.48) is just the pdf of a normal r.v. with mean and variance 1 n+l Hence, the conditional distribution of p given x,,..., x, is the normal distribution with mean and variance Thus, the Bayes' estimate of p is given by 1 n+l

266 ESTIMATION THEORY [CHAP 7 and the Bayes' estimator of p is Let (XI,..., X,) be a random sample of a r.v. X with pdf f(x; 8), where 8 is an unknown parameter. The statistics L and U determine a 10q1 - a) percent confidence interval (L, U) for the parameter 8 if P(L<O<U)>l-a O<a<l (7.51) and 1 - a is called the conjdence coefficient. Find L and U if X is a normal r.v. with known variance a2 and mean p is an unknown parameter. If X = N(p; a2), then is a standard normal r.v., and hence for a given a we can find a number za12 from Table A (Appendix A) such that For example, if 1 - a = 0.95, then z,,, = z,.,,, = 1.96, and if 1 - a = 0.9, then za12 = z ~ = ~ Now, ~ recalling that o > 0, we have the following equivalent inequality relationships; x - p -.=a/, < - a/& Consider a normal r.v. with variance 1.66 and unknown mean p. Find the 95 percent confidence interval for the mean based on a random sample of size 10. As shown in Prob. 7.14, for 1 - a = 0.95, we have za12 = z,,,,, za12(o/&) = 1.96(@/fi) = 0.8 Thus, by Eq. (7.54), the 95 percent confidence interval for p is MEAN SQUARE ESTIMATION (K - 0.8, X + 0.8) Find the m.s. estimate of a r.v. Y by a constant c. By Eq. (7.17), the m.s. error is = 1.96 and

267 CHAP. 71 ESTIMATION THEORY Clearly the m.s. error e depends on c, and it is minimum if Thus, we conclude that the mas. estimate c of Y is given by Find the m.s. estimator of a r.v. Y by a function g(x) of the r.v. X. By Eq. (7.17), the m.s. error is Since f (x, y) = f (y I x) f (x), we can write Since the integrands above are positive, the m.s. error e is minimum if the inner integrand, is minimum for every x. Comparing Eq. (7.58) with Eq. (7.55) (Prob. 7.16), we see that they are the same form if c is changed to g(x) and f (y) is changed to f (y 1 x). Thus, by the result of Prob [Eq. (7.56)], we conclude that the m.s. estimate of Y is given by Hence, the m.s. estimator of Y is Find the m.s. error if g(x) = E(Y ( x) is the m.s. estimate of Y, As we see from Eq. (3.58), the conditional mean E(Y I x) of Y, given that X = x, is a function of x, and by Eq. (4.39), SimilarIy, the conditional mean E[g(X, Y) I x] of g(x, Y), given that X = x, is a function of x. It defines, therefore, the function E[g(X, Y) I XI of the r.v. X. Then Note that Eq. (7.62) is the generalization of Eq. (7.61). Next, we note that

268 260 ESTIMATION THEORY [CHAP 7 Then by Eqs. (7.62) and (7.63), we have ECg1(X)g2(Y)I = E{ECg,(X)g2( Y ) I XI} = E{g,(X)E(g2(Y) XI) Now, setting g,(x) = g(x) and g2(y) = Y in Eq. (7.64), and using Eq. (7.18), we obtain Thus, the m.s. error is given by ECs(X)YI = ECg(X)E(Y I X)1 = ECg2(X)I e = E{[Y - g(x)i2) = E(Y2) - 2E[g(X)Y] + E[g2(X)] = E(Y2) - E[g2(X)] Let Y = X2 and X be a uniform r.v. over (- 1, 1). Find the m.s. estimator of Y in terms of X and its m.s. error. By Eq. (7.18), the m.s. estimate of Y is given by Hence, the m.s. estimator of Y is The m.s. error is g ( ~ ) = ~ ( E(x~~x=x)=x~ ~ ~ ~ ) = p=x2 e = E([Y - g(x)i2) = E([X2 - X2I2) = 0 I LINEAR MEAN SQUARE ESTIMATION Derive the orthogonality principle (7.21) and Eq. (7.22). By Eq. (7.20), the m.s. error is e(a, b) = E{[Y - (ax + b)i2) Clearly, the m.s. error e is a function of a and b, and it is minimum if ae/da = 0 and &lab = 0. Now ae - = E {~[Y ab Setting aelda = 0 and &lab = 0, we obtain - ( ax + b)](- 1)) = -2E{[Y - (ax + b)]) E{[Y - (ax + b)]x) = 0 ECY - (ax + b)] = 0 Note that Eq. (7.68) is the orthogonality principle (7.21). Rearranging Eqs. (7.68) and (7.69), we get Solving for a and b, we obtain Eq. (7.22); that is, where we have used Eqs. (2.31), (3.51), and (3.53). E(X2)a + E(X)b = E(XY) E(X)a + b = E(Y) E(XY) - E(X)E(Y) ax, a, a = - - E(X2) Pxr - [E(X)] ax2 a, b = E(Y) - ae(x) = p, - up, Show that m.s. error defined by Eq. (7.20) is minimum when Eqs. (7.68) and (7.69) are satisfied.

269 CHAP. 71 ESTIMATION THEORY Assume that P = cx + d, where c and d are arbitrary constants. Then e(c, d) = E{[Y - (cx + d)i2) = E{[Y - (ax + b) + (a - c)x + (b - d)i2) = E{[Y - (ax + b)i2} + E{[(a - c)x + (b - d)] 2, + 2(a - c)e{[ Y - (ax + b)]x) + 2(b - d)e{[y - (ax + b)]} = e(a, b) + E{[(a - c)x + (b - d)i2) + 2(a - c)e([y - (ax + b)]x) + 2(b - d)e{[y - (ax + b)]) The last two terms on the right-hand side are zero when Eqs. (7.68) and (7.69) are satisfied, and the second term on the right-hand side is positive if a # c and b # d. Thus, e(c, d) 2 e(a, b) for any c and d. Hence, e(a, b) is minimum Derive Eq. (7.23). Then By Eqs. (7.68) and (7.69), we have R([Y - (ax + b)]ax) = 0 = E([Y - (ax + b)]b) em = e(a, b) = E{[Y - (ax + b)12) = E([Y - (ax + b)][y - (ax + b)]) = E([Y - (ax + b)]y) = E(Y2) - ae(xy) - be(y) Using Eqs. (2.31), (3.51), and (3.53), and substituting the values of a and b [Eq. ( in the above expression, the minimum m.s. error is which is Eq. (7.23) Let Y = X2, and let X be a uniform r.v. over (- 1, 1) (see Prob. 7.19). Find the linear m.s. estimator of Y in terms of X and its m.s. error. The linear m.s. estimator of Y in terms of X is where a and b are given by [Eq. ( P=ax+b Now, by Eqs. (2.46) and (2.44), px = E(X) = 0 By Eq. (3.51), ax, = Cov(XY) = E(XY) - E(X)E(Y) = 0 Thus, a = 0 and b = E(Y), and the linear m.s. estimator of Y is and the m.s. error is P=b= E(Y) e = E([Y - E(Y)12) = ay Find the minimum m.s. error estimator of Y in terms of X when X and Y are jointly normal r.v.'s.

270 262 ESTIMATION THEORY [CHAP 7 By Eq. (7.18), the minimum m.s. error estimator of Y in terms of X is P= E(Y1X) Now, when X and Y are jointly normal, by Eq. (3.1 08) (Prob. 3.51), we have Hence, the minimum m.s. error estimator of Y is Comparing Eq. (7.72) with Eqs. (7.19) and (7.22), we see that for jointly normal r.v.'s the linear m.s. estimator is the minimum m.s. error estimator. Supplementary Problems Let (XI,..., X,) be a random sample of X having unknown mean p and variance a2. Show that the estimator of a2 defined by where X is the sample mean, is an unbiased estimator of a2. Note that SI2 is often called the sample variance. Hint: Show that S12 = -!- S2, and use Eq. (7.29). n Let (XI,..., X,) be a random sample of X having known mean p and unknown variance a2. Show that the estimator of a2 defined by is an unbiased estimator of a2. Hint: Proceed as in Prob Let (XI,..., X,) be a random sample of a binomial r.v. X with parameter (m, p), where p is unknown. Show that the maximum-likelihood estimator of p given by Eq. (7.34) is unbiased. Hint: Use Eq. (2.38) Let (XI,..., X,) be a random sample of a Bernoulli r.v. X with pmf f (x; p) = px(l - p)'-", x = 0, 1, where p, 0 I p I 1, is unknown. Find the maximum-likelihood estimator of p. Ans. P,, = 1 x Xi = n i=l The values of a random sample, 2.9, 0.5, 1.7, 4.3, and 3.2, are obtained from a r.v. X that is uniformly distributed over the unknown interval (a, b). Find the maximum-likelihood estimates of a and b. Ans. 2, = min xi = 0.5, hml = max xi = I

271 CHAP. 71 ESTIMATION THEORY 263 In analyzing the flow of traffic through a drive-in bank, the times (in minutes) between arrivals of 10 customers are recorded as 3.2, 2.1, 5.3, 4.2, 1.2, 2.8, 6.4, 1.5, 1.9, and 3.0. Assuming that the interarrival time is an exponential r.v. with parameter I, find the maximum likelihood estimate of I. 1 Ans. I, = Let (XI,..., X,) be a random sample of a normal r.v. X with known mean,u and unknown variance a2. Find the maximum likelihood estimator of a2. Ans SML2 = - x (Xi- n i=l Let (XI,..., X,) be the random sample of a normal r.v. X with mean p and variance a2, where p is unknown. Assume that p is itself to be a normal r.v. with mean p, and variance ai2. Find the Bayes' estimate of p. Let (XI,..., X,) be the random sample of a normal r.v. X with variance 100 and unknown p. What sample size n is required such that the width of 95 percent confidence interval is 5? Ans. n = 62 Find a constant a such that if Y is estimated by ax, the m.s. error is minimum, and also find the minimum m.s. error em. Ans. a = E(XY)/E(X2) em = E(Y2) - [E(X Y)I2/[E(X)l2 Derive Eqs. (7.25) and (7.26). Hint: Proceed as in Prob

272 Chapter 8 Decision Theory 8.1 INTRODUCTION There are many situations in which we have to make decisions based on observations or data that are random variables. The theory behind the solutions for these situations is known as decision theory or hypothesis testing. In communication or radar technology, decision theory or hypothesis testing is known as (signal) detection theory. In this chapter we present a brief review of the binary decision theory and various decision tests. 8.2 HYPOTHESIS TESTING A. Definitions: A statistical hypothesis is an assumption about the probability law of r.v.'s. Suppose we observe a random sample (XI,..., X,) of a r.v. X whose pdf f (x; 0) = f (x,,..., x,; 8) depends on a parameter 8. We wish to test the assumption 8 = 8, against the assumption 8 = 8,. The assumption 8 = 8, is denoted by H, and is called the null hypothesis. The assumption 8 = 8, is denoted by H, and is called the alternative hypothesis. H,: 8 = 8, (Null hypothesis) H, : 8 = 8, (Alternative hypothesis) A hypothesis is called simple if all parameters are specified exactly. Otherwise it is called composite. Thus, suppose H,: 0 = 8, and H, : 8 # 0,; then H, is simple and H, is composite. B. Hypothesis Testing and Types of Errors: Hypothesis testing is a decision process establishing the validity of a hypothesis. We can think of the decision process as dividing the observation space Rn (Euclidean n-space) into two regions R, and R,. Let x = (x,,..., x,) be the observed vector. Then if x E R,, we will decide on H,; if x E R,, we decide on H,. The region R, is known as the acceptance region and the region R, as the rejection (or critical) region (since the null hypothesis is rejected). Thus, with the observation vector (or data), one of the following four actions can happen: 1. H, true;accept H, 2. H, true; reject H, (or accept H,) 3. H, true; accept H, 4. H, true; reject H, (or accept H,) The first and third actions correspond to correct decisions, and the second and fourth actions correspond to errors. The errors are classified as 1. Type I error: Reject H, (or accept H,) when H, is true. 2. Type I1 error: Reject H, (or accept H,) when H, is true. Let PI and PI, denote, respectively, the probabilities of Type I and Type I1 errors:

273 CHAP. 81 DECISION THEORY 265 where Di (i = 0, 1) denotes the event that the decision is made to accept Hi. PI is often denoted by a and is known as the level of signijcance, and PI, is denoted by fl and (1 - /3) is known as the power of the test. Note that since a and /? represent probabilities of events from the same decision problem, they are not independent of each other or of the sample size n. It would be desirable to have a decision process such that both a and fl will be small. However, in general, a decrease in one type of error leads to an increase in the other type for a fixed sample size (Prob. 8.4). The only way to simultaneously reduce both type of errors is to increase the sample size (Prob. 8.5). One might also attach some relative importance (or cost) to the four possible courses of action and minimize the total cost of the decision (see Sec. 8.3D). The probabilities of correct decisions (actions 1 and 3) may he expressed as In radar signal detection, the two hypotheses are H, : No target exists HI : Target is present In this case, the probability of a Type I error PI = P(Dl I H,) is often referred to as the false-alarm probability (denoted by P,), the probability of a Type I1 error PI, = P(Do I HI) as the miss probability (denoted by P,), and P(Dl (HI) as the detection probability (denoted by PD). The cost of failing to detect a target cannot be easily determined. In general we set a value of P, which is acceptable and seek a decision test that constrains P, to this value while maximizing P, (or equivalently minimizing P,). This test is known as the Neyman-Pearson test (see Sec. 8.3C). 8.3 DECISION TESTS A. Maximum-Likelihood Test : Let x be the observation vector and P(x I Hi), i = 0.1, denote the probability of observing x given that Hi was true. In the maximum-likelihood test, the decision regions R, and R, are selected as Ro = {x: P(x I H,) > P(x ( HI)} Rl = (x: P(x I Ho) < P(x I HI)) Thus, the maximum-likelihood test can be expressed as d(x)={i: The above decision test can be rewritten as if P(x ( H,) > P(x I HI) ifp(xho)<p(xhl) If we define the likelihood ratio A(x) as then the maximum-likelihood test (8.7) can be expressed as

274 266 DECISION THEORY [CHAP. 8 which is called the likelihood ratio test, and 1 is called the threshold value of the test. Note that the likelihood ratio A(x) is also often expressed as B. MAP Test: Let P(Hi (x), i = 0, 1, denote the probability that Hi was true given a particular value of x. The conditional probability P(Hi ( x) is called a posteriori (or posterior) probability, that is, a probability that is computed after an observation has been made. The probability P(Hi), i = 0, 1, is called a priori (or prior) probability. In the maximum a posteriori (MAP) test, the decision regions R, and R, are selected as Thus, the MAP test is given by R, = (x: P(H, 1 X) > P(Hl I x)) R, = {x: P(Ho (x) < P(H, I x)) which can be rewritten as Using Bayes' rule [Eq. (1.42)], Eq. (8.1 3) reduces to Using the likelihood ratio A(x) defined in Eq. (8.8)' the MAP test can be expressed in the following likelihood ratio test as where q = P(Ho)/P(Hl) is the threshold value for the MAP test. Note that when P(H,) = P(H,), the maximum-likelihood test is also the MAP test. C. Neyman-Pearson Test : As we mentioned before, it is not possible to simultaneously minimize both a(= PI) and fl(= P,). The Neyman-Pearson test provides a workable solution to this problem in that the test minimizes fl for a given level of a. Hence, the Neyman-Pearson test is the test which maximizes the power of the test 1 - /? for a given level of significance a. In the Neyman-Pearson test, the critical (or rejection) region R, is selected such that 1 - fl = 1 - P(D, ( HI) = P(Dl I HI) is maximum subject to the constraint a = P(D, I H,) = a,. This is a classical problem in optimization: maximizing a function subject to a constraint, which can be solved by the use of Lagrange multiplier method. We thus construct the objective function J = (1 - fl) - A(a - a,) (8.1 6) where is a Lagrange multiplier. Then the critical region R, is chosen to maximize J. It can be shown that the Neyman-Pearson test can be expressed in terms of the likelihood ratio test as

275 CHAP. 8) DECISION THEORY (Prob. 8.8) where the threshold value q of the test is equal to the Lagrange multiplier A, which is chosen to satisfy the contraint a = a,. D. Bayes' Test : Let Cij be the cost associated with (D,, Hi), which denotes the event that we accept Hi when Hi is true. Then the average cost, which is known as the Bayes' risk, can be written as where P(Di, Hi) denotes the probability that we accept Hi when Hj is true. By Bayes' rule (1.42), we have In general, we assume that since it is reasonable to assume that the cost of making an incorrect decision is higher than the cost of making a correct decision. The test that minimizes the average cost e is called the Bayes' test, and it can be expressed in terms of the likelihood ratio test as (Prob. 8.10) Note that when C,, - Coo = Col - Cll, the Bayes' test (8.21) and the MAP test (8.15) are identical. (8.21) E. Minimum Probability of Error Test: If we set Coo = Cll = 0 and Col = Clo = 1 in Eq. (8.18), we have e = P(Dl, Ho) + P(Do, HI) = P, which is just the probability of making an incorrect decision. Thus, in this case, the Bayes' test yields the minimum probability of error, and Eq. (8.21) becomes We see that the minimum probability of error test is the same as the MAP test. F. Minimax Test : We have seen that the Bayes' test requires the a priori probabilities P(Ho) and P(Hl). Frequently, these probabilities are not known. In such a case, the Bayes' test cannot be applied, and the following minimax (min-max) test may be used. In the minimax test, we use the Bayes' test which corresponds to the least favorable P(Ho) (Prob. 8.12). In the minimax test, the critical region RT is defined by

276 268 DECISION THEORY [CHAP. 8 for all R, # RT. In other words, RT is the critical region which yields the minimum Bayes' risk for the least favorable P(Ho). Assuming that the minimization and maximization operations are interchangeable, then we have min max C[P(H,), R,] = max min e[p(ho), R,] RI PWo) PWo) RI The minimization of C[P(H,), R,] with respect to R, is simply the Bayes' test, so that min C[P(H,), R,] = C*[P(H,)] R 1 where C*[P(H,)] is the minimum Bayes' risk associated with the a priori probability P(H,). Thus, Eq. (8.25) states that we may find the minimax test by finding the Bayes' test for the least favorable P(Ho), that is, the P(H,) which maximizes C[P(H,)]. Solved Problems HYPOTHESIS TESTING 8.1. Suppose a manufacturer of memory chips observes that the probability of chip failure is p = A new procedure is introduced to improve the design of chips. To test this new procedure, 200 chips could be produced using this new procedure and tested. Let r.v. X denote the number of these 200 chips that fail. We set the test rule that we would accept the new procedure if X s 5. Let Find the probability of a Type I error. H,: p = 0.05 (No change hypothesis) H, : p c 0.05 (Improvement hypothesis) If we assume that these tests using the new procedure are independent and have the same probability of failure on each test, then X is a binomial r.v. with parameters (n, p) = (200, p). We make a Type I error if X 5 5 when in fact p = Thus, using Eq. (2.37), we have Since n is rather large and p is small, these binomial probabilities can be approximated by Poisson probabilities with IZ = np = 200(0.05) = 10 (see Prob. 2.40). Thus, using Eq. (2.100)' we obtain Note that H, is a simple hypothesis but H, is a composite hypothesis Consider again the memory chip manufacturing problem of Prob Now let H,: p = 0.05 (No change hypothesis) H, : p = 0.02 (Improvement hypothesis) Again our rule is, we would reject the new procedure if X > 5. Find the probability of a Type I1 error.

277 CHAP. 81 DECISION THEORY Now both hypotheses are simple. We make a Type I1 er.ror if X > 5 when in fact p = Hence, by Eq. (2.37), Again using the Poisson approximation with L = np = 200(0.02) = 4, we obtain 8.3. Let (XI,..., X,) be a random sample of a normal r.v. X with mean p and variance 100. Let H,: p= 50 HI: p=p, (>SO) and sample size n = 25. As a decision procedure, we use the rule to reject H, if ;F 2 52, where E is the value of the sample mean X defined by Eq. (7.27). (a) Find the probability of rejecting H,: p = 50 as a function of p (> 50). (b) Find the probability of a Type I error a. (c) Find the probability of a Type I1 error /I (i) when pl = 53 and (ii) when p, = 55. (a) Since the test calls for the rejection of H,: p = 50 when , the probability of rejecting H, is given by Now, by Eqs. (4.1 12) and (7.27), we have Thus,.X is N(p; 4), and using Eq. (2.55), we obtain (b) The function g(p) is known as the power function of the test, and the value of g(p) at p = p,, g(p,), is called the power at p,. Note that the power at p = 50, g(50), is the probability of rejecting H,: p = 50 when H, is true-that is, a Type I error. Thus, using Table A (Appendix A), we obtain (c) Note that the power at p = p,, g(pl), is the probability of rejecting H,: p = 50 when p = p,. Thus, 1 - g(p,) is the probability of accepting Ho when p = p,--that is, the probability of a Type I1 error jl. (i) Setting p = p, = 53 in Eq. (8.28) and using Table A (Appendix A), we obtain (ii) Similarly, for p = p1 = 55 we obtain Notice that clearly, the probability of a Type I1 error depends on the value of p,.

278 270 DECISION THEORY [CHAP Consider the binary decision problem of Prob We modify the decision rule such that we reject H, if x 2 c. (a) Find the value of c such that the probability of a Type I error a = (b) Find the probability of a Type I1 error /I when p, = 55 with the modified decision rule. (a) Using the result of part (b) in Prob. 8.3, c is selected such that [see Eq. (8.27)J a = g(50) = P(x 2 c; p = 50) = 0.05 However, when p = 50, X = N(50; 4), and [see Eq. ( From Table A (Appendix A), we have 0(1.645) = Thus c and c = (1.645) = (b) The power function g(p) with the modified decision rule is Setting p = p, = 55 and using Table A (Appendix A), we obtain Comparing with the results of Prob. 8.3, we notice that with the change of the decision rule, a is reduced from to 0.05, but j3 is increased from to Redo Prob. 8.4 for the case where the sample size n = 100. (a) With n = 100, we have As in part (a) of Prob. 8.4, c is selected so that Since X = N(50; I), we have a= g(50) = P(8 2 c; p = 50) = 0.05 Thus c - 50 = and c = (b) The power function is Setting p = p, = 55 and using Table A (Appendix A), we obtain fl = Pn = 1 - g(55) = (D( ) x

279 CHAP. 81 DECISION THEORY 27 1 Notice that with sample size n = 100, both a and fl have decreased from their respective original values of and when n = 25. DECISION TESTS 8.6. In a simple binary communication system, during every T seconds, one of two possible signals s,(t) and s,(t) is transmitted. Our two hypotheses are We assume that H,: s,(t) was transmitted. HI: s,(t) was transmitted. so@) = 0 and sl(t) = 1 0 < t < T The communication channel adds noise n(t), which is a zero-mean normal random process with variance 1. Let x(t) represent the received signal : We observe the received signal x(t) at some instant during each signaling interval. Suppose that we received an observation x = 0.6. (a) Using the maximum likelihood test, determine which signal is transmitted. (b) Find P, and P,,. (a) The received signal under each hypothesis can be written as H,: x=n HI: x=l+n Then the pdf of x under each hypothesis is given by The likelihood ratio is then given by By Eq. (8.9), the maximum likelihood test is Taking the natural logarithm of the above expression, we get (b) Since x = 0.6 > 4, we determine that signal s,(t) was transmitted. The decision regions are given by Ro = {x: x < $1 =(-a, $) R, = {x: x > $1 =(*, m)

280 DECISION THEORY [CHAP. 8 Then by Eqs. (8.1) and (8.2) and using Table A (Appendix A), we obtain 8.7. In the binary communication system of Prob. 8.6, suppose that P(H,) = and P(H,) = 4. (a) Using the MAP test, determine which signal is transmitted when x = 0.6. (b) Find PI and PI,. (a) Using the result of Prob. 8.6 and Eq. (8.15), the MAP test is given by Taking the natural logarithm of the above expression, we get Since x = 0.6 < 1.193, we determine that signal s,(t) was transmitted. (b) The decision regions are given by Ro = {x: x < 1.193) = (-MI, 1.193) R1 = {x: x > 1.193) = (1.193, a) Thus, by Eqs. (8.1) and (8.2) and using Table A (Appendix A), we obtain 8.8. Derive the Neyman-Pearson test, Eq. (8.1 7). From Eq. (8.1 6), the objective function is J = (1-8) - n(a - a,) = P(Dl ]HI)- R[P(D, (H,)- a,] (8.29) where 1 is an undetermined Lagrange multiplier which is chosen to satisfy the constraint a = a,. Now, we wish to choose the critical region R, to maximize J. Using Eqs. (8.1) and (8.2), we have

281 CHAP. 81 DECISION THEORY 273 To maximize J by selecting the critical region R,, we select x E R, such that the integrand in Eq. (8.30) is positive. Thus R, is given by and the Neyman-Pearson test is given by and 1 is determined such that the constraint is satisfied Consider the binary communication system of Prob. 8.6 and suppose that we require that a = P, = Using the Neyman-Pearson test, determine which signal is transmitted when x = 0.6. Find PI,. Using the result of Prob. 8.6 and Eq. (8.1 7), the Neyman-Pearson test is given by Taking the natural logarithm of the above expression, we get The critical region R, is thus R, = {x: x > $ + In A] Now we must determine A such that a = P, = P(D, ( H,) = By Eq. (8.1), we have Thus 1 -@(i + In A) = 0.25 or Q(4 + In A) = 0.75 From Table A (Appendix A), we find that Q(0.674) = Thus Then the Neyman-Pearson test is Since x = 0.6 < 0.674, we determine that signal so(t) was transmitted. By Eq. (8.2), we have Derive Eq. (8.21).

282 DECISION THEORY [CHAP. 8 By Eq. (8.19)' the Bayes' risk is Now we can express Then C can be expressed as Since Ro u R, = S and Ro n R, = 4, we can write Then Eq. (8.32) becomes e = coo P(H0) + COIP(H1) + JR K(C10 - Coo)P(Ho)f (x I HOI1 - W O l - Cl,)P(Hl)f (x I H,II) dx The only variable in the above expression is the critical region R,. By the assumptions [Eq. (8.20)] Clo > Coo and Col > C,,, the two terms inside the brackets in the integral are both positive. Thus, C is minimized if R, is chosen such that for all x E R,. That is, we decide to accept H, if In terms of the likelihood ratio, we obtain which is Eq. (8.21) Consider a binary decision problem with the following conditional pdf's: f(x I H,) = ie-ixi f(x(h1) = e-21xi The Bayes' costs are given by Coo = Cll = 0 C,, = 2 Cl0 = 1 (a) Determine the Bayes' test if P(Ho) = 3 and the associated Bayes' risk. (b) Repeat (a) with P(H,) = *. (a) The likelihood ratio is By Eq. (8.21)' the Bayes' test is given by

283 CHAP. 81 DECISION THEORY Taking the natural logarithm of both sides of the last expression yields Thus, the decision regions are given by Then PI = P(Dl I Ho) = P11 = P(Do 1 HI) = e2"dx+ e-"dx=2 ee-2"dx=0.25 1'0.693 I'0.693 and by Eq. (8.19), the Bayes' risk is (b) The Bayes' test is Again, taking the natural logarithm of both sides of the last expression yields Thus, the decision regions are given by Then Ro = {x: 1x1 > 1.386) R, = {x: 1x1 < 1.386) and by Eq. (8.19), the Bayes' risk is Consider the binary decision problem of Prob with the same Bayes' costs. Determine the minimax test. From Eq. (8.33), the likelihood ratio is In terms of P(Ho), the Bayes' test [Eq. (8.21)] becomes Taking the natural logarithm of both sides of the last expression yields For P(H,) > 0.8,6 becomes negative, and we always decide H,. For P(H,) _< 0.8, the decision regions are Ro = (x: 1x1 > S) R, = {.x: 1x1 < 6)

284 DECISION THEORY [CHAP. 8 Then, by setting Coo = C,, = 0, C,, = 2, and C,, = 1 in Eq. (8.19), the minimum Bayes' risk C* can be expressed as a function of P(Ho) as C*[P(Ho)] = P(Ho) I:4rxl From the definition of 6 [Eq. (8.34)], we have dx + Z[1 - P(Ho)] [(:e2. dx + [me-2x dx] i' = P(Ho)(l - e-*) + 2[1 - P(Ho)]e-26 = P(Ho) [e-. dx + 1[1 - P(Ho)] e-'. dx Thus e-d = PWo) and e-2*= p2(ho) 4[1 - P(H0)I 16[1 - P(HO)l2 Substituting these values in to Eq. (8.35), we obtain Now the value of P(Ho) which maximizes C* can be obtained by setting dc*[p(ho)j/dp(ho) equal to zero and solving for P(Ho). The result yields P(Ho) = 3. Substituting this value into Eq. (8.34), we obtain the following minimax test : Suppose that we have n observations Xi, i = 1,..., n, of radar signals, and Xi are normal iid r.v.'s under each hypothesis. Under H,, Xi have mean p, and variance a2, while under HI, Xi have mean p, and variance a2, and p, > p,. Determine the maximum likelihood test. By Eq. (2.52) for each Xi, we have Since the Xi are independent, we have With, ie likelihood ratio is then given by Hence, the maximum likelihood test is given by

285 CHAP. 8) DECISION THEORY Taking the natural logarithm of both sides of the above expression yields Equation (8.36) indicates that the statistic provides enough information about the observations to enable us to make a decision. Thus, it is called the suficient statistic for the maximum likelihood test Consider the same observations Xi, i = 1,..., n, of radar s:ignals as in Prob. 8.13, but now, under H,, Xi have zero mean and variance go2, while under HI? Xi have zero mean and variance a12, and ai2 > go2. Determine the maximum likelihood test. In a similar manner as in Prob. 8.13, we obtain With a12 - > 0, the likelihood ratio is and the maximum likelihood test is Taking the natural logarithm of both sides of the above expression yields Note that in this case, is the sufficient statistic for the maximum likelihood test In the binary communication system of Prob. 8.6, suppose that we have n independent observations Xi = X(ti), i = 1,..., n, where 0 < t1 <. < t, I T. (a) Determine the maximum likelihood test. (b) Find PI and PI, for n = 5 and n = 10. (a) Setting I(, = 0 and I(, = 1 in Eq. (8.36), the maximum likelihood test is 9 n HI

286 DECISION THEORY [CHAP. 8 (b) Let Then by Eqs. (4.108) and (4.1 12), and the result of Prob. 5.60, we see that Y is a normal r.v. with zero mean and variance lln under H,, and is a normal r.v. with mean 1 and variance l/n under H,. Thus Note that PI = PI,. Using Table A (Appendix A), we have PI = PI, = 1 = for n = 5 PI = PI, = 1 - O(1.581) = for n = In the binary communication system of Prob. 8.6, suppose that s,(t) and sl(t) are arbitrary signals and that n observations of the received signal x(t) are made. Let n samples of s,(t) and sl(t) be represented, respectively, by So=C~Ol,S02,.-.,~onl~ and where T denotes "transpose of." Determine the MAP test. For each Xi, we can write 1 f (xi ( H,) = - exp[ 4 (xi - s,,)'] Jz;; s1=cs11,s12,...,s,,3t Since the noise components are independent, we have and the likelihood ratio is given by Thus, by Eq. (8.1 5), the MAP test is given by Taking the natural logarithm of both sides of the above expression yields

287 CHAP. 81 DECISION THEORY Supplementary Problems Let (XI,..., X,) be a random sample of a Bernoulli r.v. X with pmf where it is known that 0 < p 5 3. Let and n = 20. As a decision test, we use the rule to reject H, if I:=, xi s 6. (a) Find the power function g(p) of the test. (b) Find the probability of a Type I error a. (c) Find the probability of a Type I1 (i) when p, = $ and (ii) when p, = &. Ans. (a) g@)= (2i)pk(l -P)~'-* O<pli k=o (b) a = ; (c) (i) /3 = , (ii) /3 = Let (X,,..., X,) be a random sample of a normal r.v. X with mean p and variance 36. Let H,: p=50 HI: p = 55 As a decision test, we use the rule to accept H, if 2 < 53, where 2 is the value of the sample mean. (a) Find the expression for the critical region R,. (b) Find a for n = 16. (b) a = , 8 = Let (X,,..., X,) be a random sample of a normal r.v. X with mean p and variance 100. Let As a decision test, we use the rule that we reject H, if 22 c. Find the value of c and sample size n such that a = and B = Ans. c = , n = Let X be a normal r.v. with zero mean and variance a2. Let Determine the maximum likelihood test. HI Ans. 1x Ho H,: a2 = 1 HI: a2 = Consider the binary decision problem of Prob Let P(H,) ==,$ and P(H,) = 3. Determine the MAP test.

288 280 DECISION THEORY [CHAP Consider the binary communication system of Prob (a) Construct a Neyman-Pearson test for the case where a = 0.1. (b) Find g. H1 Ans. (a) Ixl><1.282; (b) /?= Ho Consider the binary decision problem of Prob Determine the Bayes' test if P(Ho) = 0.25 and the Bayes' costs are H1 Ans. 1x Ho Coo = Cll = 0 CO1 = 1 CIO = 2

289 Chapter 9 Queueing Theory 9.1 INTRODUCTION Queueing theory deals with the study of queues (waiting lines). Queues abound in practical situations. The earliest use of queueing theory was in the design of a telephone system. Applications of queueing theory are found in fields as seemingly diverse as traffic control, hospital management, and time-shared computer system design. In this chapter, we present an elementary queueing theory. 9.2 QUEUEING SYSTEMS A. Description: A simple queueing system is shown in Fig Customers arrive randomly at an average rate of A,. Upon arrival, they are served without delay if there are available servers; otherwise, they are made to wait in the queue until it is their turn to be served. Once served, they are assumed to leave the system. We will be interested in determining such quantities as the average number of customers in the system, the average time a customer spends in the system, the average time spent waiting in the queue, etc. Arrivals + Queue b Service Departures b The description of any queueing system requires the specification of three parts: 1. The arrival process 2. The service mechanism, such as the number of servers and service-time distribution 3. The queue discipline (for example, first-come, first-served) B. Classification : Fig. 9-1 A simple queueing system. The notation A/B/s/K is used to classify a queueing system, where A specifies the type of arrival process, B denotes the service-time distribution, s specifies the number of servers, and K denotes the capacity of the system, that is, the maximum number of customers that can be accommodated. If K is not specified, it is assumed that the capacity of the system is unlimited. For example, an M/M/2 queueing system (M stands for Markov) is one with Poisson arrivals (or exponential interarrival time distribution), exponential service-time distribution, and 2 servers. An M/G/l queueing system has Poisson arrivals, general service-time distribution, and a single server. A special case is the M/D/1 queueing system, where D stands for constant (deterministic:) service time. Examples of queueing systems with limited capacity are telephone systems with limited trunks, hospital emergency rooms with limited beds, and airline terminals with limited space in which to park aircraft for loading and unloading. In each case, customers who arrive when the systeim is saturated are denied entrance and are lost. C. Useful Formulas Some basic quantities of queueing systems are

290 QUEUEING THEORY [CHAP. 9 L: the average number of customers in the system L,: the average number of customers waiting in the queue L,: the average number of customers in service W: the average amount of time that a customer spends in the system W,: the average amount of time that a customer spends waiting in the queue W,: the average amount of time that a customer spends in service Many useful relationships between the above and other quantities of interest can be obtained by using the following basic cost identity: Assume that entering customers are required to pay an entrance fee (according to some rule) to the system. Then we have Average rate at which the system earns = A, x average amount an entering customer (9.1) pays where 1, is the average arrival rate of entering customers X(t) A, = lim 7 and X(t) denotes the number of customer arrivals by time t. If we assume that each customer pays $1 per unit time while in the system, Eq. (9.1) yields L=R,W Equation (9.2) is sometimes known as Little's formula. Similarly, if we assume that each customer pays $1 per unit time while in the queue, Eq. (9.1) yields Lq = A, wq (9.3) If we assume that each customer pays $1 per unit time while in service, Eq. (9.1) yields (9.4 Note that Eqs. (9.2) to (9.4) are valid for almost all queueing systems, regardless of the arrival process, the number of servers, or queueing discipline. 9.3 BIRTH-DEATH PROCESS We say that the queueing system is in state S, if there are n customers in the system, including those being served. Let N(t) be the Markov process that takes on the value n when the queueing system is in state S, with the following assumptions: 1. If the system is in state S,, it can make transitions only to S,-, or S,+,, n 2 1 ; that is, either a customer completes service and leaves the system or, while the present customer is still being serviced, another customer arrives at the system ; from So, the next state can only be S,. 2. If the system is in state S, at time t, the probability of a transition to Sn+, in the time interval (t, t + At) is an At. We refer to a, as the arrival parameter (or the birth parameter). 3. If the system is in state S, at time t, the probability of a transition to S,-, in the time interval (t, t + At) is dn At. We refer to d, as the departure parameter (or the death parameter). The process N(t) is sometimes referred to as the birth-death process. Let p,(t) be the probability that the queueing system is in state S, at time t; that is, Then we have the following fundamental recursive equations for N(t) (Prob. 9.2):

291 CHAP. 91 QUEUEING THEORY Assume that in the steady state we have and setting pb(t) and pk(t) = 0 in Eqs. (9.6), we obtain the following steady-state recursive equation: (an + dn)~n = an-lpn-1 + dn+lpn+l n 2 1 and for the special case with do = 0, Equations (9.8) and (9.9) are also known as the steady-state equilibrium equations. The state transition diagram for the birth-death process is shown in Fig Fig. 9-2 State transition diagram for the birth-death process. Solving Eqs. (9.8) and (9.9) in terms of p,, we obtain where po can be determined from the fact that provided that the summation in parentheses converges to a finite value. 9.4 THE M/M/1 QUEUEING SYSTEM In the M/M/1 queueing system, the arrival process is the Poisson process with rate A (the mean arrival rate) and the service time is exponentially distributed with parameter p (the mean service rate). Then the process N(t) describing the state of the M/M/1 queueing system at time t is a birth-death process with the Following state independent parameters: Then from Eqs. (9.1 0) and (9.1 I), we obtain (Prob. 9.3) where p = Alp < 1, which implies that the server, on the average, must process the customers faster than their average arrival rate; otherwise the queue length (the number of customers waiting in the queue) tends to infinity. The ratio p = Alp is sometimes referred to as the trafjic intensity of the

292 284 QUEUEING THEORY [CHAP. 9 system. The traffic intensity of the system is defined as mean service time mean arrival rate Traffic intensity = - mean interarrival time mean service rate The average number of customers in the system is given by (Prob. 9.4) Then, setting A, = 3, in Eqs. (9.2) to (9.4), we obtain (Prob. 9.5) 9.5 THE M/M/s QUEUEING SYSTEM In the M/M/s queueing system, the arrival process is the Poisson process with rate A and each of the s servers has an exponential service time with parameter p. In this case, the process N(t) describing the state of the M/M/s queueing system at time t is a birth-death process with the following parameters : Note that the departure parameter d, is state dependent. Then, from Eqs. (9.10) and (9.1 I), we obtain (Prob. 9.10) where p = A/(+) < 1. Note that t :he ratio p = is the traffic intensity of the M/M/s queueing system. The average number of customers in the system and the average number of customers in the queue are given, respectively, by (Prob. 9.12) By Eqs. (9.2) and (9.3), the quantities W and W, are given by

293 CHAP. 91 QUEUEING THEORY 9.6 THE M/M/l/K QUEUEING SYSTEM In the M/M/l/K queueing system, the capacity of the system is limited to K customers. When the system reaches its capacity, the effect is to reduce the arrival rate to zero until such time as a customer is served to again make queue space available. Thus, the M/M/l/K queueing system can be modeled as a birth-death process with the following parameters : Then, from Eqs. (9.1 0) and (9.1 I), we obtain (Pro b. 9.14) where p = Alp. It is important to note that it is no longer necessary that the traffic intensity p = R/p be less than 1. Customers are denied service when the system is in state K. Since the fraction of arrivals that actually enter the system is 1 - p,, the effective arrival rate is given by fe = 41 - PK) The average number of customers in the system is given by (Prob. 9.15) Then, setting A, = A, in Eqs. (9.2) to (9.4), we obtain 9.7 THE M/M/s/K QUEUEING SYSTEM Similarly, the M/M/s/K queueing system can be modeled as a birth-death process with the following parameters : Then, from Eqs. (9.1 0) and (9.1 I), we obtain (Prob. 9.17) where p = A&). Note that the expression for p, is identical in form to that for the M/M/s system, Eq. (9.21). They differ only in the po term. Again, it is not necessary that p = A/(sp) be less than 1. The

294 QUEUEING THEORY [CHAP. 9 average number of customers in the queue is given by (Prob. 9.18) The average number of customers in the system is The quantities W and Wq are given by 9.1. Deduce the basic cost identity (9.1). Solved Problems Let T be a fixed large number. The amount of money earned by the system by time T can be computed by multiplying the average rate at which the system earns by the length of time T. On the other hand, it can also be computed by multiplying the average amount paid by an entering customer by the average number of customers entering by time T, which is equal to AaT, where A, is the average arrival rate of entering customers. Thus, we have Average rate at which the system earns x T = average amount paid by an entering customer x (JOT) Dividing both sides by T (and letting T -, co), we obtain Eq. (9.1) Derive Eq. (9.6). From properties 1 to 3 of the birth-death process N(t), we see that at time t + At, the system can be in state S, in three ways: 1. By being in state S, at time t and no transition occurring in the time interval (t, t + At). This happens with probability (1 - a, At)(l - d, At) = 1 - (a, + d,) At [by neglecting the second-order effect a, dn(at) By being in state S,-, at time t and a transition to S, occurring in the time interval (t, t + At). This happens with probability a, -, At. 3. By being in state S,,, at time t and a transition to S, occurring in the time interval (t, t + At). This happens with probability d, +, At. Let pi(t) = P[N(t) = i] Then, using the Markov property of N(t), we obtain

295 CHAP. 91 QUEUEING THEORY Rearranging the above relations Letting At -, 0, we obtain 9.3. Derive Eqs. (9.13) and (9.1 4). Setting un = 1, do = 0, and dn = y in Eq. (9.1 O), we get Pl=-P C1 0 - PPo where po is determined by equating from which we obtain 9.4. Derive Eq. (9.15). Since pn is the steady-state probability that the system contains exactly n customers, using Eq. (9.14), the average number of customers in the M/M/1 queueing system is given by where p = 1/y < 1. Using the algebraic identity we obtain 9.5. Derive Eqs. (9.1 6) to (9.1 8). Since 1, = A, by Eqs. (9.2) and (9.15), we get

296 QUEUEING THEORY [CHAP. 9 which is Eq. (9.1 6). Next, by definition, wq = w - w, where W, = 1/p, that is, the average service time. Thus, which is Eq. (9.1 7). Finally, by Eq. (9.31, which is Eq. (9.1 8) Let W, denote the amount of time an arbitrary customer spends in the M/M/1 queueing system. Find the distribution of W,. We have 03 P{ W, I a} = P( W, I a 1 n in the system when the customer arrives} n = 0 x P{n in the system when the customer arrives} (9.44) where n is the number of customers in the system. Now consider the amount of time W, that this customer will spend in the system when there are already n customers when he or she arrives. When n = 0, then W, = W,,,,, that is, the service time. When n 2 1, there will be one customer in service and n - 1 customers waiting in line ahead of this customer's arrival. The customer in service might have been in service for some time, but because of the memoryless property of the exponential distribution of the service time, it follows that (see Prob. 2.57) the arriving customer would have to wait an exponential amount of time with parameter p for this customer to complete service. In addition, the customer also would have to wait an exponential amount of time for each of the other n - 1 customers in line. Thus, adding his or her own service time, the amount of time W, that the customer will spend in the system when there are already n customers when he or she arrives is the sum of n + 1 iid exponential r.v.'s with parameter p. Then by Prob. 4.33, we see that this r.v. is a gamma r.v. with parameters (n + 1, p). Thus, by Eq. (2.83), P{Wa 5 a I n in the system when customer arrives) = From Eq. (9.1 4), Hence, by Eq. (9.44), P{n in the system when customer arrives) = pn = 1 -- ( XY Thus, by Eq. (2.79), W, is an exponential r-v. with parameter p - 1. Note that from Eq. (2.99), E(W,) = 1/(p - A), which agrees with Eq. (9.16), since W = E( W,) Customers arrive at a watch repair shop according to a Poisson process at a rate of one per every 10 minutes, and the service time is an exponential r.v. with mean 8 minutes. (a) Find the average number of customers L, the average time a customer spends in the shop W, and the average time a customer spends in waiting for service W,.

297 CHAP. 91 QUEUEING THEORY 289 (b) Suppose that the arrival rate of the customers increases 10 percent. Find the corresponding changes in L, W, and W,. (a) The watch repair shop service can be modeled as an M/M/1 queueing system with 1 = &, p = 4. Thus, from Eqs. (9.1 5), (9.1 6), and (9.43), we have (b) Now 1 = 4, p = g. Then 1 1 w=-= minutes - A Q-iij W, = W - W, = 40-8 = 32 minutes 1 w=- --= 1-72 minutes p-a +-g W, = W - W, = 72-8 = 64 minutes It can be seen that an increase of 10 percent in the customer arrival rate doubles the average number of customers in the system. The average time a customer spends in queue is also doubled A drive-in banking service is modeled as an M/M/1 queueing system with customer arrival rate of 2 per minute. It is desired to have fewer than 5 customers line up 99 percent of the time. How fast should the service rate be? From Eq. (9.14), a, a, 1 P(5 or more customers in the system} = C pn = C (1 - p)pn = p5 p = - n=5 n=5 P In order to have fewer than 5 customers line up 99 percent of the time, we require that this probability be less than Thus, from which we obtain Thus, to meet the requirements, the average service rate must be at least customers per minute People arrive at a telephone booth according to a Poisson process at an average rate of 12 per hour, and the average time for each call is an exponential r.v. with mean 2 minutes. (a) What is the probability that an arriving customer will find the telephone booth occupied? (b) It is the policy of the telephone company to install additional booths if customers wait an average of 3 or more minutes for the phone. Find the average arrival rate needed to justify a second booth. (a) The telephone service can be modeled as an M/M/1 queueing system with 1 = 4, p = 3, and p = 1/p = 5. The probability that an arriving customer will find the telephone occupied is P(L > 0), where L is the average number of customers in the system. Thus, from Eq. (9.1 3),

298 QUEUEING THEORY [CHAP. 9 (b) From Eq. (9.1 7), from which we obtain R per minute. Thus, the required average arrival rate to justify the second booth is 18 per hour Derive Eqs. (9.20) and (9.21). From Eqs. (9.1 9) and (9.1 O), we have Let p = I/(sp). Then Eqs. (9.46) and (9.47) can be rewritten as which is Eq. (9.21). From Eq. (9.1 I), p, is obtained by equating Using the summation formula we obtain Eq. (9.20); that is, provided p = A/(sp) < Consider an M/M/s queueing system. Find the probability that an arriving customer is forced to join the queue. An arriving customer is forced to join the queue when all servers are busy-that is, when the number of customers in the system is equal to or greater than s. Thus, using Eqs. (9.20) and (9.21), we get w ss (SPY P(a customer is forced to join queue) = pn = p0-1 pn = po - n=s S!,=, s!(l - p) Equation (9.49) is sometimes referred to as Erlang's delay (or C) formula and denoted by C(s, Alp). Equation (9.49) is widely used in telephone systems and gives the probability that no trunk (server) is available for an incoming call (arriving customer) in a system of s trunks Derive Eqs. (9.22) and (9.23).

299 CHAP. 91 QUEUEING THEORY Equation (9.21) can be rewritten as Then the average number of customers in the system is " - ' (spy' Using the summation formulas, n = 0 n = 0 and Eq. (9.20), we obtain Next, using Eqs. (9.21) and (9.50), the average number of customers in the queue is A corporate computing center has two computers of the same capacity. The jobs arriving at the center are of two types, internal jobs and external jobs. These jobs have Poisson arrival times with rates 18 and 15 per hour, respectively. The service time for a job is an exponential r.v. with mean 3 minutes. (a) Find the average waiting time per job when one computer is used exclusively for internal jobs and the other for external jobs. (b) Find the average waiting time per job when two computers handle both types of jobs. (a) When the computers are used separately, we treat them as two M/M/1 queueing systems. Let W,, and W,, be the average waiting time per internal job and per external job, respectively. For internal jobs,

300 292 QUEUEING THEORY [CHAP. 9 A1 = = & and p1 = 3. Then, from Eq. (9.16), For external jobs, 1, = = $ and p2 = 5, and Wq I = ---- = 27 min 4(4-6) 1 ;I Wq2 = min +(3 - $1 (b) When two computers handle both types of jobs, we model the computing service as an M/M/2 queueing system with Now, substituting s = 2 in Eqs. (9.20), (9.22), (9.24), and (9.25), we get Thus, from Eq. (9.54), the average waiting time per job when both computers handle both types of jobs is given by 2(%) %= 11 = 6.39 min mu - (%)21 From these results, we see that it is more efficient for both computers to handle both types of jobs Derive Eqs. (9.27) and (9.28). From Eqs. (9.26) and (9.1 O), we have From Eq. (9.1 I), p, is obtained by equating Using the summation formula we obtain Note that in this case, there is no need to impose the condition that p = Alp < Derive Eq. (9.30).

301 CHAP. 91 QUEUEING THEORY 293 Using Eqs. (9.28) and (9.51), the average number of customers in the system is given by Consider the M/M/l/K queueing system. Show that Lq = L - (1 - p,) 1 wq=-l P In the M/M/l/K queueing system, the average number of customers in the system is The average number of customers in the queue is K L = E(N) = np, and 1 p, = 1 n = 0 n = 0 K A customer arriving with the queue in state S, has a wait time T, that is the sum of n independent exponential r.v.'s, each with parameter p. The expected value of this sum is n/p [Eq. ( Thus, the average amount of time that a customer spends waiting in the queue is Simililarly, the amount of time that a customer spends in the system is Note that Eqs. (9.57) to (9.59) are equivalent to Eqs. (9.31) to (9.33) (Prob. 9.27) Derive Eqs. (9.35) and (9.36). As in Prob. 9.10, from Eqs. (9.34) and (9.10), we have Let p = rl/(sp). Then Eqs. (9.60) and (9.61) can be rewritten as

302 QUEUEING THEORY [CHAP. 9 which is Eq. (9.36). From Eq. (9.1 I), p, is obtained by equating K n=o Using the summation formula (9.56), we obtain which is Eq. (9.35) Derive Eq. (9.37). Using Eq. (9.36) and (9.51), the average number of customers in the queue is given by = P o 7 C( n=s (sp)" K-s n - s)pn-' = po SI mpm m=o Consider an M/M/s/s queueing system. Find the probability that all servers are busy. Setting K = s in Eqs. (9.60) and (9.61), we get and p, is obtained by equating Thus The probability that all servers are busy is given by Note that in an M/M/s/s queueing system, if an arriving customer finds that all servers are busy, the customer will turn away and is lost. In a telephone system with s trunks, p, is the portion of incoming calls which will receive a busy signal. Equation (9.64) is often referred to as Erlang's loss (or B) formula and is commonly denoted as B(s, Alp) An air freight terminal has four loading docks on the main concourse. Any aircraft which arrive when all docks are full are diverted to docks on the back concourse. The average aircraft arrival

303 CHAP. 91 QUEUEING THEORY rate is 3 aircraft per hour. The average service time per aircraft is 2 hours on the main concourse and 3 hours on the back concourse. Find the percentage of the arriving aircraft that are diverted to the back concourse. If a holding area which can accommodate up to 8 aircraft is added to the main concourse, find the percentage of the arriving aircraft that are diverted to the back concourse and the expected delay time awaiting service. The service system at the main concourse can be modeled as an M/M/s/s queueing system with s = 4, 1 = 3, p = 3, and 1/p = 6. The percentage of the arriving aircraft that are diverted to the back concourse is From Eq. (9.64), 100 x P(al1 docks on the main concourse are full) 64/4! P(al1 docks on the main concourse are full) = p, = P C 9 n=o Thus, the percentage of the arriving aircraft that are diverted to the back concourse is about 47 percent. With the addition of a holding area for 8 aircraft, the service system at the main concourse can now be modeled as an M/M/s/K queueing system with s = 4, K = 12, and p = A/(sp) = 1.5. NOW, from Eqs. (9.35) and (9.36), Thus, about 33.2 percent of the arriving aircraft will still be diverted to the back concourse. Next, from Eq. (9.37), the average number of aircraft in the queue is Then, from Eq. (9.40), the expected delay time waiting for service is W, = Lq = 6'0565 P hours 1 - ) 3( ) Note that when the 2-hour service time is added, the total expected processing time at the main concourse will be hours compared to the 3-hour service time at the back concourse. Supplementary Problems Customers arrive at the express checkout lane in a supermarket in a Poisson process with a rate of 15 per hour. The time to check out a customer is an exponential r.v. with mean of 2 minutes. (a) Find the average number of customers present. (b) What is the expected idle delay time experienced by a customer? (c) What is the expected time for a customer to clear a system? Ans. (a) 1 ; (h) 2 min; (c) 4 min Consider an M/M/1 queueing system. Find the probability of finding at least k customers in the system. Ans. pk =

304 QUEUEING THEORY [CHAP In a university computer center, 80 jobs an hour are submitted on the average. Assuming that the computer service is modeled as an M/M/1 queueing system, what should the service rate be if the average turnaround time (time at submission to time of getting job back) is to be less than 10 minutes? Ans jobs per minute The capacity of a communication line is 2000 bits per second. The line is used to transmit 8-bit characters, and the total volume of expected calls for transmission from many devices to be sent on the line is 12,000 characters per minute. Find (a) the traffic intensity, (h) the average number of characters waiting to be transmitted, and (c) the average transmission (including queueing delay) time per character. Ans. (a) 0.8; (h) 3.2; (c) 20 ms A bank counter is currently served by two tellers. Customers entering the bank join a single queue and go to the next available teller when they reach the head of the line. On the average, the service time for a customer is 3 minutes, and 15 customers enter the bank per hour. Assuming that the arrivals process is Poisson and the service time is an exponential r.v., find the probability that a customer entering the bank will have to wait for service. Ans A post office has three clerks serving at the counter. Customers arrive on the average at the rate of 30 per hour, and arriving customers are asked to form a single queue. The average service time for each customer is 3 minutes. Assuming that the arrivals process is Poisson and the service time is an exponential r.v., find (a) the probability that all the clerks will be busy, (b) the average number of customers in the queue, and (c) the average length of time customers have to spend in the post office. Ans. (a) 0.237; (6) 0.237; (c) min Show that Eqs. (9.57) to (9.59) and Eqs. (9.31) to (9.33) are equivalent. Hint: Use Eq. (9.29) Find the average number of customers L in the M/M/l/K queueing system when iz = p. Ans. K/ A gas station has one diesel fuel pump for trucks only and has room for three trucks (including one at the pump). On the average trucks arrive at the rate of 4 per hour, and each truck takes 10 minutes to service. Assume that the arrivals process is Poisson and the service time is an exponential r.v. (a) What is the average time for a truck from entering to leaving the station? (b) What is the average time for a truck to wait for service? (c) What percentage of the truck trafic is being turned away? Ans. (a) min; (h) min; (c) 12.3 percent Consider the air freight terminal service of Prob How many additional docks are needed so that at least 80 percent of the arriving aircraft can be served in the main concourse with the addition of holding area? Ans. 4

305 Appendix A Fig. A Table A Normal

306 TABLE A NORMAL DISTRIBUTION Table A--Continued The material below refers to Fig. A.

307 Appendix B Fourier Transform B.1 CONTINUOUS-TIME FOURIER TRANSFORM Definition : I Property Table B-1 Properties of the Continuous-Time Fourier Transform Signal Fourier Transform Linearity Time shifting Frequency shifting Time scaling Time reversal Duality Time differentiation Frequency differentiation Integration Convolution Multiplication Real signal Even component Odd component I Parseval's theorem

308 FOURIER TRANSFORM [APP. B Table B-2 Common Continuous-Time Fourier Transform Pairs 2nS(o) 2nS(o - w,) a[g(o - a,) + S(w + o,)] -jn[s(o - oo) - 6(w + a,)] (jo + a)' 2a a2 + o2 sin at 7ct sgn t = i 1-1 t>o t<o sin oa 2a - ma B.2 DISCRETE-TIME FOURIER TRANSFORM Definition : Table B-3 Properties of the Discrete-Time Fourier Transform

309 APP. B] FOURIER TRANSFORM Property Frequency shifting Time reversal Frequency differentiation Table B-SContinued Sequence Fourier Transform Accumulation Convolution Multiplication Real sequence Even component Odd component Parseval's theorem Table B-4 Common Discrete-Time Fourier Transform Pairs 1 n6iq + l_e-" sin Wn nn O<W<z ae-jr 1 (1 - ae-jn)' 1 -a2 1-2a cos Q ): a2 sin[r(n, + t)] sin(r/2)

310

311 Index A a priori probability, 266 a posteriori probability, 266 Absorbing states, 168 Absorption, 168 probability, 168 Acceptance region, 264 Accessible states, 167 Algebra of sets, 2-5, 12 Alternative hypothesis, 264 Aperiodic states, 168 Arrival parameter, 282 Arrival process, 170, 201 Autocorrelation, 162, 210 Autocovariance, 162 Axioms of probability, 5-6 B Bayes' estimator, 249 estimation, 248, 255 rule, 8 test, 267 theorem, 8 Bernoulli distribution, 43 experiment, 33 process, 172 r.v., 43 trials, 33 Best estimator, 249 Biased estimator, 251 Binomial distribution, 44 coefficient, 44 r.v., 44 Birth-death process, 282 Bivariate normal distribution, 88 r.v., 79, 89 Bonferroni's inequality, 17 Boole's inequality, 18 Brownian motion process (see Wiener process) Buffon's needle, 103 C Cauchy criterion, 221 r.v., 77-78, 135 Cauchy-Schwarz inequality, 108 Central limit theorem, 47, , Chapman-Kolomogorov equation, 166 Characteristic function, ,

312 304 Index Chebyshev inequality, 66 Chi-square (c2) r.v., 146 Complement of set, 2 Complex random process, , 218 Composite hypothesis, 264 Conditional distribution, 48, 71, 83, 104 expectation, 85, 126 mean, 85, 110 probability, 7, 24 probability density function (pdf), 83 probability mass function (pmf), 83 variance, 85, 110 Confidence coefficient, 258 interval, 258 Consistent estimator, 248 Continuity theorem of probability, 19 Convolution, 137, 214 integral, 214 sum, 214 Correlation, 85 coefficient, 84-85, 107 Counting process, 170 Covariance, 84-85, 107 matrix, 89 Craps, 34 Critical region, 264 Cross-correlation, 211 Cross power spectral density (or spectrum), 212 Cumulative distribution function (cdf), 37 D Decision test, 265, 271 Bayes', 267, 274 likelihood ratio, 266 MAP (maximum a posteriori), 266 maximum-likelihood, 265 minimax (min-max), 267 minimum probablity of error, 267 Neyman-Pearson, 266, 272 Decision theory, 264 De Morgan's laws, 5 Dirac d function, 213 Disjoint sets, 3 Distribution Bernoulli,, 43 binomial, 44 conditional, 48, 71, 83, 104 exponential, 46 first-order, 162 limiting, 169 normal (or gaussian), 47 nth-order, 162 Poisson, 44, 68

313 Index 305 Distribution (continued) second-order, 162 stationary, 169 uniform, 45 Distribution function, 37, 39, 49 cumulative (cdf), 37 Domain, 38 E Efficient estimator, 247 Ensemble, 161 average, 162 Equally likely events, 7, 20 Ergodic, in the mean, 243 process, 165 Erlang's, delay (or C) formula, 290 loss (or B) formula, 294 Estimates, point, 247 interval, 247 Estimation, 247 Bayes', 255 error, 249 linear, 249 mean square, 249 maximum likelihood, 253 mean square, 249 parameter, 247 Estimator, Bayes' 249 best, 249 biased, 251 consistent, 248 efficient, 247 maximum-likelihood, 248 minimum mean square error, 249 point, 247, 250 unbiased, 247 Events, 2, 9 certain, 3 elementary, 2 equally likely, 7, 20 impossible, 3 independent, 8 mutually exclusive, 6 and exhaustive, 8 null, 3 Expectation, 42, 125 conditional, 85, 126 Expected value (see Mean) Experiment, Bernoulli, 33 random, 1 Exponential, distribution, 46 r.v., 46 F Fourier series, 216, 236

314 306 Index Fourier series (continued) Perseval's theorem for, 237 Fourier transform, 218, 240, 299 Functions of r.v.'s, , 129, 136, 144 G Gambler's ruin, 190 Gamma, function, 59 r.v., 59, 145 Gaussian distribution (see Normal distribution) Geometric r.v., 55, 62 H Hypergeometric r.v., 76 Hypothesis alternative, 264 composite, 264 null, 264 simple, 264 Hypothesis testing, , 268 level of significance, 265 power of, 265 I Impulse response, 214 Independent (statistically) events, 8 increments, 163 process, 163 r.v.'s, 80-81, 83 Interarrival process, 170 Intersection of sets, 2 Interval estimate, 247 J Jacobian, 125 Joint characteristic function, 127 distribution function, 80, 89 moment-generating function, 126 probability density function (pdf), 82 probability mass function (pmf), 82 K Karhunen-Loeve expansion, 217, 231 L Lagrange multiplier, 266 Laplace r.v., 77 Law of large numbers, 128, 155 Level of significance, 265 Likelihood function, 248 ratio, 265 test, 260

315 Index 307 Limiting distribution, 169 Linear mean-square estimation, 249 Linear system, 213, 231 impulse response of, 214 response to random inputs, , 231 Little's formula, 282 Log-normal r.v., 134 M MAP (maximum a posteriori) test, 266 Marginal distribution function, 81 cumulative distribution function (cdf), 81 probability density function (pdf), 82 probability mass function (pmf), 81 Markov chains, 164 discrete-parameter, , 185 homogeneous, 165 irreducible, 167 nonhomogeneous, 165 regular, 169 inequality, 66 matrix, 166 process, 164, 183 property, 74, 164 Maximum likelihood estimator, 248 Mean, 42 Mean square continuity, 209 derivative, 209 error, 249 minimum, 250 estimation, 249 linear, 249 integral, 210 periodicity, 216 Median, 76 Memoryless property (see Markov property) Mercer's theorem, 217 Minimax (min-max) test, 267 Minimum probability of error test, 267 Minimum variance estimator, 248 Mixed r.v., 41 Mode, 76 Moment, 42, 84 Moment generating function, 126 Most efficient estimator, 248 Multinomial coefficient, 114 distribution, 88 theorem, 114 trial, 88 Multiple r.v., 79

316 308 Index Mutually exclusive events, 3, 6, 9 and exhaustive events, 8 sets, 3 N Negative binomial r.v., 77 Neyman-Pearson test, 266, 272 Normal distribution, 47, 297 bivariate, 88 n-variate, 88 process, 164, 184 r.v., 47 standard, 47 Null event (set), 3 hypothesis, 264 recurrent state, 168 O Orthogonal r.v., 85 Orthogonality principle, 250 Outcomes, 1 P Parameter estimation, 247 Parameter set, 161 Parseval's theorem, 237 Periodic states, 168 Point estimators, 247, 250 Point of occurrence, 169 Poisson, distribution, 44, 68 process, r.v., 44 white noise, 230 Positive recurrent states, 168 Posterior probability, 266 Power function, 269 of test, 265 Power spectral density (or spectrum), , 225 Prior probability, 266 Probability, 1 density function (pdf), 41 mass function (pmf), 41 measure, 5 Q Queueing system, 281 theory, 281

317 Index 309 R Random experiment, 1 process, 161 complex, , 218 independent, 163 real, 161 sample, 155, 247 telegraph signal, 228 semi, 227 variable (r.v.), 38 continuous, 41, 76, 82 discrete, 41 function of, 122 mixed, 41 vector, 79, 86 walk, 173 simple, 173, 183, 195 Range, 38 Rayleigh r.v., 59, 143 Real random process, 161 Recurrent states, 167 null, 168 positive, 168 Regression line, 250 Rejection region, 264 Relative frequency, 5 Renewal process, 170 S Sample function, 161 mean, 128, 155 point, 1 space, 1 variance, 262 vector (see Random sample) Sets, 1 algebra of, 2-5, 12 disjoint, 3 intersection of, 2 mutually exclusive, 3 union of, 2 Simple hypothesis, 264 random walk, 173, 183, 195 Standard deviation, 43 normal r.v., 47 State space, 161 States absorbing, 168 accessible, 167 aperiodic, 168 periodic, 168

318 310 Index States (continued) recurrent, 167 null, 168 positive, 168 transient, 168 Stationary distributions, 169 independent increments, 163 processes, 163 strict sense, 163 wide sense (WSS), 163 transition probability, 165 Statistic, 247 sufficient, 277 Stochastic, continuity, 209 derivative, 209 integral, 210 matrix, (see Markov matrix) periodicity, 216 process, (see Random process) System, linear, 213 linear time invariance (LTI), response to random inputs, parallel, 33 series, 12 T Threshold value, 266 Time-average, 165 Time autocorrelation function, 165 Total probability, 8 Traffic intensity, Transient states, 168 Transition probability, 165 matrix, 165 stationary, 165 Type I error, 264 Type II error, 264 U Unbiased estimator, 247 Uncorrelated r.v.'s, 85 Uniform, distribution, 45 r.v., 45 Union of sets, 2 Unit, impulse function (see Dirac d function) impulse sequence, 213 sample response, 214 sample sequence, 213 Universal set, 1 V Variance, 42 conditional, 85 Vector mean, 89

319 Index 311 Venn diagram, 3 W Waiting time, 202 White noise, 213, 229 normal (or gaussian), 229 Poisson, 230 Wiener-Khinchin relations, 211 Wiener process, 172, 204 standard, 172 with drift coefficient, 172

320

Probabilistic models

Probabilistic models Kolmogorov (Andrei Nikolaevich, 1903 1987) put forward an axiomatic system for probability theory. Foundations of the Calculus of Probabilities, published in 1933, immediately became