G METHOD IN ACTION: FROM EXACT SAMPLING TO APPROXIMATE ONE

Size: px
Start display at page:

Download "G METHOD IN ACTION: FROM EXACT SAMPLING TO APPROXIMATE ONE"

Transcription

1 G METHOD IN ACTION: FROM EXACT SAMPLING TO APPROXIMATE ONE UDREA PÄUN Communicated by Marius Iosifescu The main contribution of this work is the unication, by G method using Markov chains, therefore, a Markovian unication, in the nite case, of ve (even six) sampling methods from exact and approximate sampling theory. This unication is in conjunction with a main problem, our problem of interest: nding the fastest Markov chains in sampling theory based on Metropolis-Hastings chains and their derivatives. We show, in the nite case, that the cyclic Gibbs sampler the Gibbs sampler for short belongs to our collection of hybrid Metropolis-Hastings chains from [U. Päun, A hybrid Metropolis-Hastings chain, Rev. Roumaine Math. Pures Appl. 56 (0), 07 8]. So, we obtain, for any type of Gibbs sampler (cyclic, random, etc.), in the nite case, the structure of matrices corresponding to the coordinate updates. Concerning our hybrid Metropolis-Hastings chains, to do unications, and, as a result of these, to do comparisons and improvements, we construct, by G method, a chain, which we call the reference chain. This is a very fast chain because it attains its stationarity at time. Moreover, the reference chain is the best one we can construct (concerning our hybrid chains), see Uniqueness Theorem. The reference chain is constructed such that it can do what a exact sampling method, which we called the reference method, does. This method contains, as special cases, the alias method and swapping method. Coming back to the reference chain, this is sometimes identical with the Gibbs sampler or with a special Gibbs sampler with grouped coordinates we illustrate this case for two classes of wavy probability distributions. As a result of these facts, we give a method for generating, exactly (not approximately), a random variable with geometric distribution. Finally, we state the fundamental idea on the speed of convergence: the nearer our hybrid chain is of its reference chain, the faster our hybrid chain is. The addendum shows that we are on the right track. AMS 00 Subject Classication: 60J0, 65C05, 65C0, 68U0. Key words: G method, unication, exact sampling, alias method, swapping method, reference method, reference chain, approximate sampling, hybrid Metropolis-Hastings chain, optimal hybrid Metropolis chain, Gibbs sampler, wavy probability distribution, uniqueness theorem, fundamental idea on the speed of convergence, G comparison. REV. ROUMAINE MATH. PURES APPL. 6 (07), 3, 4345

2 44 Udrea Päun. SOME BASIC THINGS In this section, we present some basic things (notation, notions, and results) from [78] with completions. [7] refers, especially, to very fast Markov chains (i.e., Markov chains which converge very fast) while in [8], based on [7] (Example. was the starting point), is constructed a collection of hybrid Metropolis-Hastings chains. This collection of Markov chains has something in common this is interesting! with some exact sampling methods (see the next sections), one of these methods, the swapping method, suggesting, in fact, the construction of this collection (in Example. from [7] mentioned above, it is a chain which can do what the swapping method does). These common things could help us to nd fast approximate sampling methods or, at best, fast exact sampling methods (see the fundamental idea on the speed of convergence, etc.). This way based on exact sampling methods, to nd ecient exact or approximate sampling methods, is our way the best way. On the other hand, the exact sampling methods are important sources to obtain very fast Markov chains. Set Par (E) = { is a partition of E }, where E is a nonempty set. We shall agree that the partitions do not contain the empty set. Denition.. Let, Par(E). We say that is ner than if V, W such that V W. Write when is ner than. In this article, a vector is a row vector and a stochastic matrix is a row stochastic matrix. The entry (i, j) of a matrix Z will be denoted Z ij or, if confusion can arise, Z i j. Set m = {,,..., m} (m ), m = {0,,..., m} (m 0), N m,n = {P P is a nonnegative m n matrix}, S m,n = {P P is a stochastic m n matrix}, N n = N n,n, S n = S n,n. Let P = (P ij ) N m,n. Let = U m and = V n. Set the matrices P U = (P ij ) i U,j n, P V = (P ij ) i m,j V, and P V U = (P ij ) i U,j V

3 3 G method in action: from exact sampling to approximate one 45 (e.g., if then, e.g., E.g., Set P = ( 3 4 P {} = ( 3 4 ), P {3} = ( 4 ), ), and P {} {} = () ). ({i}) i {s,s,...,s t} = ({s }, {s },..., {s t }) ; ({i}) i {s,s,...,s t} Par ({s, s,..., s t }). ({i}) i n = ({}, {},..., {n}). Denition.. Let P N m,n. We say that P is a generalized stochastic matrix if a 0, Q S m,n such that P = aq. Denition.3 ([7]). Let P N m,n. Let Par( m ) and Σ Par( n ). We say that P is a [ ]-stable matrix on Σ if PK L is a generalized stochastic matrix, K, L Σ. In particular, a [ ]-stable matrix on ({i}) i n is called [ ]-stable for short. Denition.4 ([7]). Let P N m,n. Let Par( m ) and Σ Par( n ). We say that P is a -stable matrix on Σ if is the least ne partition for which P is a [ ]-stable matrix on Σ. In particular, a -stable matrix on ({i}) i n is called -stable while a ( m )-stable matrix on Σ is called stable on Σ for short. A stable matrix on ({i}) i n is called stable for short. For interesting examples of -stable matrices on Σ for some and Σ, see Sections and 3. Let Par( m ) and Par( n ). Set (see [7] for G, and [8] for _ G, ) and G, = {P P S m,n and P is a [ ] -stable matrix on } _ G, = {P P N m,n and P is a [ ] -stable matrix on }. When we study or even when we construct products of nonnegative matrices (in particular, products of stochastic matrices) using G, or _ G, we shall refer this as the G method. Let _ P G,. Let K and L. Then a K,L 0, Q K,L S K, L such that PK L = a K,LQ K,L. Set P + = ( P + KL )K,L, P + KL = a K,L, K, L

4 46 Udrea Päun 4 (P + KL, K, L, are the entries of matrix P + ). If confusion can arise, we write P +(, ) instead of P +. In this article, when we work with the operator ( ) + = ( ) + (, ), we suppose, for labeling the rows and columns of matrices, that and are ordered sets (i.e., these are sets where the order in which we write their elements counts), even if we omit to specify this. E.g., let P = P G,, where = ({, }, {3}) and = ({, }, {3, 4}). Further, we have ) P + = P +(, ) = ( ({, } and {3} are the rst and the second element of, respectively; based on this order, the rst and the second row of P + are labeled {, } and {3}, respectively. The columns of P + are labeled similarly.) Below we give a basic result. Theorem.5 ([8]). Let P _ G, N m,n and Q _ G, 3 N n,p. Then (i) P Q _ G, 3 N m,p ; (ii) (P Q) + = P + Q +. Proof. See [8]. In this article, the transpose of a vector x is denoted x. Set e = e (n) = (,,..., ) R n, n. Below we give an important result.. Theorem.6 ([8]). Let P _ G ( m ), N m,m, P _ G, 3 N m,m 3,..., P n _ G n, n N mn,m n, P n _ G n,({i}) i mn+ N m n,m n+. Then (i) P P...P n is a stable matrix; (ii) (P P...P n ) {i} = P + P +...Pn +, i m ((P P...P n ) {i} is the row i of P P...P n ); therefore, P P...P n = e π, where π = P + P +...Pn +. Proof. See [8]. Remark.7. Under the assumptions of Theorem.6, but taking P S m,m, P S m,m 3,..., P n S mn,m n, P n S mn,mn+, we have pp P...P n = π

5 5 G method in action: from exact sampling to approximate one 47 for any probability distribution p on m. Consequently, Theorem.6 could be used to prove that certain Markov chains have nite convergence time (see [7] and Sections and 3 for some examples). and Let P N m,n. Set α (P ) = _ α (P ) = min i,j m k= max i,j m k= n min (P ik, P jk ) n P ik P jk. If P S m,n, then α (P ) is called the Dobrushin ergodicity coecient of P ([5]; see, e.g., also [3, p. 56]). Theorem.8. (i) _ α (P ) = α (P ), P S m,n. (ii) µp νp µ ν _ α (P ), µ, ν, µ and ν are probability distributions on m, P S m,n. (iii) _ α (P Q) _ α (P ) _ α (Q), P S m,n, Q S n,p. Proof. (i) See, e.g., [3, p. 57] or [4, p. 44]. (ii) See, e.g., [5] or [4, p. 47]. (iii) See, e.g., [5], or [3, pp. 5859], or [4, p. 45]. Theorem.6 (see also Remark.7) could be used, e.g., in exact sampling theory based on nite Markov chains (see Section 3; see also Section ) while the next result could be used, e.g., in approximate sampling theory based on nite Markov chains (see Section 4). Theorem.9 ([8]). Let P N m,m, P N m,m 3,..., P n N mn,m n+. Let = ( m ), Par( m ),..., n Par( m n ), n+ = ({i}) i mn+. Consider the matrices L l = ((L l ) V W ) V l,w l+ ((L l ) V W is the entry (V, W ) of matrix L l ), where (L l ) V W = min i V (P l ) ij, l n, V l, W l+. j W Then α (P P...P n ) (L L...L n ) m K. K n+ (Since L L...L n is a m n+ matrix, it can be thought of as a row vector, but above we used and below we shall use, if necessary, the matrix notation for its entries instead of the vector one. Above the matrix notation (L L...L n ) m K was used instead of the vector one (L L...L n ) K because, in this article, the notation A U, where A N p,q and U p, means something dierent.)

6 48 Udrea Päun 6 Proof. See [8]. (Theorem.9 is part of Theorem.8 from [8].) Denition.0 (see, e.g., [0, p. 80]). Let P N m,n. We say that P is a row-allowable matrix if it has at least one positive entry in each row. Let P N m,n. Set _ (_ ) P = P ij N m,n, _ { if Pij > 0, P ij = 0 if P ij = 0, i m, j n. We call _ P the incidence matrix of P (see, e.g., [3, p. ]). In this article, some statements on the matrices hold, obviously, eventually by permutation of rows and columns. For simplication, further, we omit to specify this fact. Warning! In this article, if a Markov chain has the transition matrix P = P P...P s, where s and P, P,..., P s are stochastic matrices, then any -step transition of this chain is performed via P, P,..., P s, i.e., doing s transitions: one using P, one using P,..., one using P s. (See also Section.) Let S = r. Let π = (π i ) i S = (π, π,..., π r ) be a positive probability distribution on S. One way to sample approximately or, at best, exactly from S when r is by means of the hybrid Metropolis-Hastings chain from [8]. Below we dene this chain. Let E be a nonempty set. Set if and, where, Par(E). Let,,..., t+ Par(S) with = (S)... t+ = ({i}) i S, where t. Let Q, Q,..., Q t S r such that (C) _ Q, Q,..., Q t are symmetric matrices; (C) (Q l ) L K = 0, l t {}, K, L l, K L (this assumption implies that Q l is a block diagonal matrix and l -stable matrix on l, l t {}); (C3) (Q l ) U K is a row-allowable matrix, l t, K l, U l+, U K. Although Q l, l t, are not irreducible matrices if l, we dene the matrices P l, l t, as in the Metropolis-Hastings case ([6] and []; see, e.g., also [7, pp. 3336], [9, Chapter 6], [, pp. 5], [5, pp. 6366], and [, Chapter 0]), namely, ) P l = ((P l ) ij S r, 0 if j i and (Q l ) ij = 0, ( ) (Q (P l ) ij = l ) ij min, π j(q l ) ji π i (Q l ) if j i and (Q l ) ij ij > 0, (P l ) ik if j = i, k i

7 7 G method in action: from exact sampling to approximate one 49 l t. Set P = P P...P t. Theorem. ([8]). Concerning P above we have πp = π and P > 0. Proof. See [8]. By Theorem., P n e π as n. We call the Markov chain with transition matrix P the hybrid Metropolis-Hastings chain. In particular, we call this chain the hybrid Metropolis chain when Q, Q,..., Q t are symmetric matrices. We call the conditions (C)(C3) the basic conditions of hybrid Metropolis- Hastings chain. In particular, we call these conditions the basic conditions of hybrid Metropolis chain when Q, Q,..., Q t are symmetric matrices. The basic conditions (C)(C3) and other conditions, which we call the special conditions, determine special hybrid Metropolis-Hastings chains. E.g., in [8] was considered the next ( special hybrid Metropolis chain. Supposing that l = K (l),..., K(l), l t +, this chain satis-, K(l) u l ) es the conditions (C)(C3) and, moreover, the conditions: (c) K (l) = K (l) =... = K u (l) l, l t + with ul ; (c) r = r r...r t with r r...r l = l+, l t, and r t = K (t) (this condition is compatible with... t+ ); (c3) (c3.) Q l is a symmetric matrix such that (c3.) (Q l ) ii > 0, i S, and (Q l ) i j = (Q l ) i j, i, i, j, j S with i j, i j, and (Q l ) i j, (Q l ) i j > 0, l t ((c3.) says that all the positive entries of Q l, excepting the entries (Q l ) ii, i S, are equal, l t ); (c4) (Q l ) U K has in each row just one positive entry, l t, K l, U l+ with U K (this condition is compatible with (c3.) because (Q l ) W V is a square matrix, l t, V, W l+ ). The condition (c) is superuous because it follows from (C) and (c4). (c) is also superuous because it follows from (c) and... t+. It is interesting to note that the matrices P, P,..., P t satisfy conditions similar to (C)(C3) and, for this special chain, moreover, (c4) simply we replace Q l with P l, l t, in (C)(C3) and, if need be, in (c4). (c)(c) are common conditions for Q, Q,..., Q t and P, P,..., P t. In [8], for the chain satisfying the conditions (C)(C3) and (c)(c4), the positive entries of matrices Q l, l t, were, taking Theorem.9 into account, optimally chosen, i.e., these were chosen such that the lower bound of α (P P...P t ) from Theorem.9 be as large as possible (we need this condition to obtain a chain with a speed of convergence as large as possible). More

8 40 Udrea Päun 8 precisely, setting f l = π j min i,j S,(Q l ) ij >0 π i (do not forget the condition (Q l ) ij > 0!) and x l = (Q l ) ij, where i, j S are xed such that i j and (Q l ) ij > 0 (see (c3) again), it was found (taking Theorem.9 into account) x l = f l + r l. We call this chain the optimal hybrid Metropolis chain with respect to the conditions (C)(C3) and (c)(c4) and the inequality from Theorem.9 we call it the optimal hybrid Metropolis chain for short. In Section 3, we show that the Gibbs sampler on h n, h, n (more generally, on h h... h n, h, h,..., h n, n ) belongs to our collection of hybrid Metropolis-Hastings chains. Moreover, we shall show that the Gibbs sampler on h n satises all the conditions (c)(c4), excepting (c3). As to the estimate of p n π (p n and π are dened below), we have the next result. Theorem. (see, e.g., [8]). Let P S r be an aperiodic irreducible matrix. Consider a Markov chain with transition matrix P and limit probability distribution π. Let p n be the probability distribution of chain at time n, n 0. Then p n π _ α (P n ), n 0 (P 0 = I r ; by Theorem.8(iii), α _ (P n ) (_ α (P ) ) n, n 0, α _ ( _α ( (P n ) P k)) n k, n, k n ( x = max {b b Z, b x}, x R), etc.). Proof. See, e.g., [8] (it is used Theorem.8(ii) for the proof).. EXACT SAMPLING In this section, we consider a similarity relation. This has some interesting properties. Then we consider two methods of generation of the random variables exactly in the nite case only. The rst one, the alias method, is a special case of the second one. For each of these methods, we associate a Markov chain

9 9 G method in action: from exact sampling to approximate one 4 such that this chain can do what the method does. These associate chains are important for our unication. Finally, we associate a hybrid chain with a reference chain. Denition.. Let P, Q _ G, N m,n. We say that P is similar to Q if P + = Q +. Set P Q when P is similar to Q. Obviously, is an equivalence relation on _ G,. Theorem.. Let P, U _ G, N m,m and P, U _ G, 3 N m,m 3. Suppose that P U and P U. Then P P U U. Proof. By Theorem.5 we have P P, U U _ G, 3 N m,m 3. By Theorem.5 and Denition. we have Therefore, (P P ) + = P + P + = U + U + = (U U ) +. P P U U. Theorem.3. Let P, U _ G, N m,m, P, U _ G, 3 N m,m 3,..., P n, U n _ G n, n+ N mn,m n+. Suppose that P U, P U,..., P n U n. Then P P...P n U U...U n. If, moreover, = ( m ) and n+ = ({i}) i mn+, then P P...P n = U U...U n (therefore, when = ( m ) and n+ = ({i}) i mn+, a product of n representatives, the rst of an equivalence class included in G, _, the second of an equivalence class included in included in _ G, 3,..., the nth of an equivalence class _ G n, n+, does not depend on the choice of representatives). Proof. The rst part follows by Theorem. and induction. As to the second part, by Theorem.6, P P...P n and U U...U n are stable matrices and, further, Therefore, (P P...P n ) {i} = P + P +...P + n = = U + U +...U + n = (U U...U n ) {i}, i m. P P...P n = U U...U n.

10 4 Udrea Päun 0 The reader is assumed to be acquainted with the rst method below. Recall that for each of the two methods below we associate a Markov chain such that this chain can do what the method does.. The alias method (see, e.g., [4, pp. 073] and [5, pp. 57]). To illustrate our Markovian modeling here, we consider, for simplication, the example from [5, p. 5]. Following this example, we have a random variable X with the values,, 3, 4, 5 and probabilities π = 0.4, π = 0.7, π 3 = 0.07, π 4 = 0.4, π 5 = 0., where π i = P (X = i), i 5. The alias method leads, following the example from [5, p. 5] too, to the table (having rows and 5 columns) , 0.3, etc. are probabilities while,, 3, 4, 5 in bold print are values of X. In each column of the table, the sum of probabilities is equal to 0.0. We associate the alias method for generating X (when this method is applied to the generation of X) with the Markov chain (X n ) n 0 with state space S = {(3, ), (, ), (5, ), (, ), (4, 3), (, 3), (, 4), (, 4), (, 5)} if (x, x ) S, then x denotes a value of X while x denotes the column of table in which the value x is; for x = 5 (column 5), we only consider the state (, 5) because in column 5 the second probability is 0 and transition matrix P = P P, where P = (3, ) (, ) (5, ) (, ) (4, 3) (, 3) (, 4) (, 4) (, 5) (the columns are labeled similarly, i.e., (3, ), (, ), (5, ), (, ), (4, 3), (, 3), (, 4), (, 4), (, 5) from left to right) and

11 G method in action: from exact sampling to approximate one 43 P = (3, ) (, ) (5, ) (, ) (4, 3) (, 3) (, 4) (, 4) (, 5) P G,, P G, 3, where = (S), = ({(3, ), (, )}, {(5, ), (, )}, {(4, 3), (, 3)}, {(, 4), (, 4)}, {(, 5)}), 3 = ({(x, y)}) (x,y) S. By Theorem.6 it follows that P is a stable matrix and, more precisely, where P = e ρ, ρ = (0.07, 0.3, 0., 0.09, 0.4, 0.06, 0.9, 0.0, 0.0) (see the table again). Recall even if, here, P = e ρ that any -step transition of this chain is performed via P, P, i.e., doing two transitions: one using P and the other using P. Passing this Markov chain from an initial state, say, (3, ) (the state at time 0) to a state at time is done using, one after the other, the probability distributions (P ) {(3,)} (this is the rst row of matrix P ) suppose that using this probability distribution the chain arrives at state (i, j) and (P ) {(i,j)}. The alias method for generating X uses these probability distributions too, in the same order, the 0 s do not count, they can be removed e.g., (P ) {(3,)} = (0.0, 0, 0.0, 0, 0.0, 0, 0.0, 0, 0.0) leads, removing the 0 s, to (0.0, 0.0, 0.0, 0.0, 0.0),

12 44 Udrea Päun which is the probability distribution used by the alias method in its rst step (when, obviously, this method is applied to X from here). Therefore, this chain can do what the alias method does (we need to run this chain just one step (or two steps due to P and P ) until time inclusive). By Theorem.3 we can replace P with any matrix, U, similar to P obviously, it is more advantageous that each of the matrices U {(3,),(,)}, U {(5,),(,)}, U {(4,3),(,3)}, U {(,4),(,4)}, U {(,5)} have in each row just one positive entry. E.g., we can take U = and have (3, ) (, ) (5, ) (, ) (4, 3) (, 3) (, 4) (, 4) (, 5) P = P P = U P.. The reference method for our collection of hybrid chains (in particular, for the Gibbs sampler). We call it the reference method for short. Its name as well as the name of the chain determined of this, called the reference chain (see below for this chain), were inspired by the reference point from physics and other elds. The reference method is an iterative composition method we include the degenerate case when no composition is done (this case corresponds to the case t = of reference method). Below we present the reference method. Let X be a random variable with positive probability distribution π = (π, π,..., π r ) = (π i ) i S, where S = r. Let,,..., t+ Par(S) with = (S)... t+ = ({i}) i S, where t. Set Obviously, a (l) K,L = π i i L, l t, K l, L l+ with L K. π i i K a (l) K,L = P (X L X K ), l t, K l, L l+ with L K (a () S,L = i L π i = P (X L), L ),

13 3 G method in action: from exact sampling to approximate one 45 and ( ) a (l) K,L L K l+ is a probability distribution on K l+ = {K A A l+ } = = {B B l+, B K }, l t, K l. We generate random variable X as follows. (See also discrete mixtures in, e.g., [4, p. 6], the decomposition method in, e.g., [4, p. 66], and the composition method in, e.g., [4, p. 66].) ( ) Step. Generate L () a () S,L. Suppose that we obtained L () = L S L (L S = ). Set K = ( L. ) Step. Generate L () a () K,L. Suppose that we obtained L K 3 L () = L (L K 3 ). Set K = L.. Suppose that at Step t we obtained L (t ) = L t (L t K t t ). Set K t = L t. ( ) Step t. Generate L (t) a (t) K t,l. Suppose that we obtained L K t t+ L (t) = L t (L t K t t+ ). Since t+ = ({i}) i S, it follows that i S such that L t = {i}. Set X = i this value of X is generated according to its probability distribution π because by general multiplicative formula (see, e.g., [3, p. 6]) we have P (X = i) = P (X {i}) = P (X L t ) = P (X L L... L t ) = = a () S,L a () L,L...a (t) L t,l t = π i. The reference method is very fast if, practically speaking, we know the quantities a (l) K,L, l t, K l, L l+ with L K. Unfortunately, this does not happen in general if S is too large. But, fortunately, we can compute all or part of the quantities a (l) K,L when K and L are small this is an important thing (see, e.g., in Section 3, the hybrid chains with P ). To connect the reference method to our collection of hybrid chains, we associate the reference method (for generating a (nite) random variable (when this method is applied to the generation of a (nite) random variable)) with a (nite) Markov chain. To do this, rst, recall that the partitions for the reference method are,,..., t+ Par(S) with = (S)... t+ = ({i}) i S, where t. Let R, R,..., R t S r such that (A) R G,, R G, 3,..., R t G t, t+ ; (A) (R l ) L K = 0, l t {}, K, L l, K L (this assumption implies that R l is a block diagonal matrix and l -stable matrix on l, l t {});

14 46 Udrea Päun 4 (A3) (R l ) U K has in each row just one positive entry, l t, K l, U l+, U K (this assumption implies that (R l ) U K is a row-allowable matrix, l t, K l, U l+, U K). (A) and (A3) are similar to (C) and (c4) from Section, respectively. Therefore, the matrices P, P,..., P t of hybrid chain (obviously, we refer to our hybrid Metropolis-Hastings chain) and the matrices R, R,..., R t have some common things, respectively. This fact contributes to our unication (see Section 4). Suppose that each positive entry of (the matrix) (R l ) U K is equal to a(l) K,U are equal), l t, K (by (A) and (A3), all the positive entries of (R l ) U K l, U l+, U K. Set R = R R...R t. The following result is a main one both for the reference chain (this is dened below) and for (our) hybrid chains. Theorem.4. Under the above assumptions the following statements hold. (i) R l R l+...r t is a block diagonal matrix, l t {}, and l -stable matrix, l t. (ii) πr l R l+...r t = π, l t. (iii) R is a stable matrix. (iv) R = e π. Proof. (i) The rst part follows from (A). Now, we show the second part. By Theorem.5(i) and (A), R l R l+...r t G l, t+. Consequently, R l R l+...r t is a [ l ]-stable matrix, l t, because t+ = ({i}) i S (see Denition.3). Case. l =. Since = (S), it follows that R R...R t is a -stable (stable for short) matrix (see Denition.4). Case. l t {}. By (A) it follows that R l R l+...r t is a l -stable matrix. (ii) Let j S. Then!U (t) t such that {j} U (t) (! = there exists a unique). By (A)(A3) we have ( R + t )U (t) {j} > 0 and ( R + t ) V {j} = 0, V t, V U (t). Further, since t t,!u (t ) t such that U (t) U (t ). By (A) (A3) we have ( R + t > 0 and )U ( R + ) (t ) U (t) t = 0, V V U (t) t, V U (t ). Proceeding in this way, we nd a sequence U (), U (),..., U (t+) such that U (), U (),..., U (t+) t+, {j} = U (t+) U (t)... U () = S,

15 5 G method in action: from exact sampling to approximate one 47 and ( R + l > 0 and )U ( R + ) (l) U (l+) l = 0, l t, V V U (l+) l, V U (l). Let l t. Let i U (l) (j U (l) as well). By (i) (the fact that R l R l+...r t is a l -stable matrix), (A)(A), and Theorem.5 we have (R l R l+...r t ) ij = (R l R l+...r t ) + U (l) {j} = ( R + l R + ) l+...r + t U (l) {j} = = ( R + ( l )U (l) U R + (l+) l+... )U ( R + ) (l+) U (l+) t = a (l) U (l),u (l+) a (l+) U (l+),u (l+)...a (t) U (t),{j} = π j U (t) {j} = k U (l) π k (it was to be expected see (i) that this ratio will not depend on i U (l) ). Consequently, πr l R l+...r t = π. (iii) This follows by (i) or (iv). (iv) By proof of (ii) we have R ij = π j, i, j S. Therefore, R = e π. We associate the reference method for generating X with the Markov chain (this depends on X too) with state space S = r and transition matrix R = R R...R t. Recall even if, here, by Theorem.4(iv), R = e π that any -step transition of this chain is performed via R, R,..., R t, i.e., doing t transitions: one using R, one using R,..., one using R t. We call the above Markov chain the reference (Markov) chain. This is another example of chain with nite convergence time (see, e.g., also [7] for other examples of chains with nite convergence time). The best case is when we know the quantities a (l) K,L, l t, K l, L l+, L K; we can always know all or part of the quantities a (l) K,L when K and L are small a happy case! Passing the reference chain from an initial state, say, (the state at time 0) to a state at time is done using, one after the other, the probability distributions (R ) {} (this is the rst row of matrix R ), (R ) {i },..., (R t), where {it} i l = the state the chain arrives using (R l ) {il }, l t {}, setting i =. The reference method for generating X uses these probability distributions too, in the same order, the 0 s do not count, they can be removed e.g., if ( ) (R ) {} = a (), where S = 4, K () =, K () S,K (), 0, 0, a () S,K () = {3, 4}, then, removing the 0 s, we obtain ( ) a (), S,K (), a () S,K ()

16 48 Udrea Päun 6 which is the probability distribution ( used) by the reference method in its rst step, being, here, equal to K (), K(). Therefore, the reference chain can do what the reference method does (we need to run this chain just one step (or t steps due to R, R,..., R t ) until time inclusive). To illustrate the reference chain, we consider a random variable X with probability distribution π = (π, π,..., π 8 ). Taking the partitions = ( 8 ), = ({,, 3, 4}, {5, 6, 7, 8}), 3 = ({, }, {3, 4}, {5, 6}, {7, 8}), 4 = ({i}) i 8, a reference chain is the Markov chain with state space S = 8 and transition matrix R = R R R 3, where R = a () S,K () 0 a () S,K () a () S,K () 0 0 a () S,K () 0 0 a () S,K () a () S,K () a () S,K () 0 0 a () S,K () a () S,K () a () S,K () 0 0 a () S,K () 0 a () S,K () 0 a () S,K () a () 0 S,K () a () a () S,K () K () = {,, 3, 4}, K () = {5, 6, 7, 8}, a () = π S,K () + π + π 3 + π 4, a () = π S,K () 5 + π 6 + π 7 + π 8 (R G, ), R () = R = R() 0 a () K (),K(3) a () K (),K(3) a () K (),K(3) 0 a () K (),K(3) R (), 0 a () K (),K(3) 0 a () K (),K(3) 0 0 a () K (),K(3) 0 0 a () K (),K(3), S,K () 0,

17 7 G method in action: from exact sampling to approximate one 49 R () = a () K (),K(3) 3 a () K (),K(3) 3 0 a () K (),K(3) 3 a () K (),K(3) 3 0 a () K (),K(3) a () K (),K(3) 4 a () K (),K(3) a () K (),K(3) 4 K (3) = {, }, K (3) = {3, 4}, K (3) 3 = {5, 6}, K (3) 4 = {7, 8}, a () π + π =, a () π 3 + π 4 =, K (),K(3) π + π + π 3 + π 4 K (),K(3) π + π + π 3 + π 4 a () π 5 + π 6 =, a () π 7 + π 8 = K (),K(3) 3 π 5 + π 6 + π 7 + π 8 K (),K(3) 4 π 5 + π 6 + π 7 + π 8 (R G, 3 (moreover, it is a -stable matrix on, -stable matrix on 3, and block diagonal matrix)), and R 3 = R w (3) = a (3) = K w (3),K (4) w R (3) a (3) K (3) w,k (4) w a (3) K (3) w,k (4) w K (4) R (3) R (3) 3 a (3) K (3) w,k (4) w a (3) K (3) w,k (4) w i = {i}, i 8, π w, a (3) = π w + π w K w (3),K (4) w R (3) 4, 0 0, w 4,, π w π w + π w, w 4 (R 3 G 3, 4 (moreover, it is a 3 -stable matrix on 3, 3 -stable matrix (because 4 = ({i}) i 8, see Denition.4), and block diagonal matrix)). By Theorem.4(iv) or direct computation we have R = e π. Warning! In the above example, R, R, R 3 are representatives of certain equivalence classes: R l, where l 3, is a representative of the equivalence class determined by quantities a (l) K,L, K l, L l+ (the number of elements of this class can easily be determined; e.g., R belongs to an equivalence class with the cardinal equal to 4 6 because R K() and R K() are 8 4 matrices,...). Each triple (R, R, R 3 ) of representatives determines a reference chain, all these chains having the product R R R 3 equal to e π (see Theorems.3 and.4(iv)).

18 430 Udrea Päun 8 Now, it is easy to see that the chain associated with the alias method is a special case of the reference chain. Therefore (this was to be expected), the alias method is a special case of the reference method. Another interesting special case of the reference method (and of the reference chain) is the method of uniform generation of the random permutations of order n from, e.g., Example. in [7]. In Example. from [7], it is presented and analyzed a Markov chain which can do what the swapping method does (for the swapping method, see, e.g., [4, pp ]). When the probability distribution of interest, π, is uniform, we have a (l) K,L = L K, l t, K l, L l+, L K another happy case! We here supposed that there are t + partitions; in Example. from [7], t = n. Finally, for the next sections, we need to associate a hybrid Metropolis- Hastings chain with a reference chain. Below we state the terms of this association. Remark.5. This association makes sense (warning!) if, obviously, both chains are dened by means of the same state space S, the same probability distribution (of interest) π on S, and the same sequence of partitions,,..., t+ on S with = (S)... t+ = ({i}) i S, where t. We shall use expressions as the hybrid Metropolis-Hastings chain and its reference chain, the Gibbs sampler and its reference chain, the reference chain of a hybrid Metropolis-Hastings chain, etc., meaning that both chains from each expression are associated in this manner and the reference chain (with transition matrix R = R R...R t ) from each expression has the matrices R, R,..., R t specied or not, in the latter case, R, R,..., R t are only from the equivalence classes R, R,..., R t, respectively ( R l = the equivalence class of R l, l t ), so, the reader, in this latter case, has a complete freedom to choose the matrices R, R,..., R t as he/she wishes. The association from Remark.5 is good for our unication and, as a result of this, is good for comparisons and improvements (see the next sections). 3. EXACT SAMPLING USING HYBRID CHAINS In this section, rst, we show that the Gibbs sampler on h n (i.e., on {0,,..., h} n ), h, n, belongs to our collection of hybrid Metropolis-Hastings chains from [8]. Second, we give some interesting classes of probability distributions they are interesting because: ) supposing that the generation time is not limited, we can generate any random variable with probability distribution belonging to the union of these classes exactly (not approximately) by Gibbs sampler (sometimes by optimal hybrid Metropolis chain) or by a special Gibbs

19 9 G method in action: from exact sampling to approximate one 43 sampler with grouped coordinates in just one step; ) sometimes, the Gibbs sampler or a special Gibbs sampler with grouped coordinates is identical with its reference chain. An application on the random variables with geometric distribution is given. Third, results on the hybrid chains or reference chains are given. We begin the denition of Gibbs sampler we refer to the cyclic Gibbs sampler. Below we consider the (cyclic) Gibbs sampler on h n, h, n (more generally, we can consider the state space h h... h n, h, h,..., h n, n ), see [0]; see, e.g., also [, 3, 6], [7, pp. 364], [8], [9, Chapter 5], [, pp. and 554], [5, pp. 698], and [, Chapters 5 and 7]. Recall that the entry (i, j) of a matrix Z is denoted Z ij or, if confusion can arise, Z i j. We use the convention that an empty term vanishes. Let x = (x, x,..., x n ) S = h n, h, n. Set x [k l ] = (x, x,..., x l, k, x l+,..., x n ), k h, l n (consequently, x [k l ] S, k h, l n ). Let π be a positive probability distribution on S = h n (h, n ). Set the matrices P l, l n, where 0 if y x [k l ], k h, (P l ) xy = π x[k l ] π if y = x [k l ] for some k h, x[j l ] j h l n, x, y S. Set P = P P...P n. Consider the Markov chain with state space S = h n (h, n ) and transition matrix P above. This chain is called the cyclic Gibbs sampler the Gibbs sampler for short. For labeling the rows and columns of P, P,..., P n and other things, we consider the states of S = h n in lexicographic order, i.e., in the order (0, 0,..., 0), (0, 0,..., 0, ),..., (0, 0,..., 0, h), (0, 0,..., 0,, 0), (0, 0,..., 0,, ),..., (0, 0,..., 0,, h),..., (0, 0,..., 0, h, 0), (0, 0,..., 0, h, ),..., (0, 0,..., 0, h, h),..., (h, h,..., h, 0), (h, h,..., h, ),..., (h, h,..., h). Further, we show that the Gibbs sampler on h n, h, n (more generally, on h h... h n, h, h,..., h n, n ) belongs to our collection of hybrid Metropolis-Hastings chains from [8] and satises, moreover, the conditions (c)(c4), excepting (c3). More precisely, we show that the Gibbs sampler on h n satises all the conditions (C)(C3) (basic conditions) and (c)(c4) (special conditions), excepting (c3), and the equations from the denition of hybrid Metropolis-Hastings chain. To see this, following the second special case from Section 3 in [8] (there it was considered a more

20 43 Udrea Päun 0 general framework, namely, when the coordinates are grouped (blocked) into groups (blocks) of size v), set K (x,x,...,x l ) = {(y, y,..., y n ) (y, y,..., y n ) S and y i = x i, i l }, l n, x, x,..., x l h (obviously, K (x,x,...,x n) = {(x, x,..., x n )}), and = (S), l+ = ( ) K (x,x,...,x l ), l n. x,x,...,x l h Obviously, = (S)... n+ = ({x}) x S. Note also that the sets S, K (x,x,...,x l ), x, x,..., x l h, l n, determine, by inclusion relation, a tree which we call the tree of inclusions. For simplication, below we give the tree of inclusions for S = n (i.e., for S = {0, } n ). S K (0) K () K (0,0) K (0,) K (,0) K (,).. K (0,0,...,0) K (0,0,...,0,) K (,,...,,0) K (,,...,) Following, e.g., [7, pp. 4], we dene the matrices Q, Q,..., Q n as follows: Q l = P l, l n. It is easy to prove that the matrices Q l, l n, satisfy the basic conditions (C)(C3) from Section. Further, it is easy to prove that the matrices P l and Q l satisfy the equations 0 if y x and (Q l ) xy = 0, ( ) (Q (P l ) xy = l ) xy min, πy(q l) yx π x(q l ) if y x and (Q l ) xy xy > 0, (P l ) xz if y = x, z S,z x l n, x, y S. (Further, it follows that the conclusion of Theorem. holds, in particular, for P = P P...P n.) Therefore, the Gibbs sampler on h n belongs to our collection of hybrid Metropolis-Hastings chains from [8]. Now,

21 G method in action: from exact sampling to approximate one 433 it is easy to prove that the Gibbs sampler on h n satises, moreover, the special conditions (c)(c4), excepting (c3). Finally, we have the following result. Theorem 3.. The Gibbs sampler on h n belongs to our collection of hybrid Metropolis-Hastings chains. Moreover, this chain satises the conditions (c)(c4), excepting (c3). Proof. See above. Based on Theorem 3., it is easy now to show that the chain on h n, h, n (more generally, on h h... h n, h, h,..., h n, n ), dened below is, according to our denition, a hybrid Metropolis- Hastings chain which satises, moreover, all or part of the conditions (c)(c4). This chain on h n is a generalization of the (cyclic) Gibbs sampler on h n as follows: the matrices Q, Q,..., Q n of Gibbs sampler (see before Theorem 3.) are, more generally, replaced with the matrices Q, Q,..., Q n (we used the same notation for these) such that _ Q, _ Q,..., _ Q n of the former matrices are identical with _ Q, _ Q,..., _ Q n of the latter matrices, respectively; P = P P...P n is the transition matrix of this chain, where, using Metropolis-Hastings rule, P, P,..., P n are dened by means of the more general matrices Q, Q,..., Q n, respectively. Since we now know the structure of matrices P, P,..., P n corresponding to the update of coordinates,,..., n, respectively, we could study other types of Gibbs samplers on h n, h, n (the random Gibbs sampler, etc., see, e.g., [], [8], [5, pp. 773], and [9]), and, more generally, other types of chains on h n, h, n (a generalization of the random Gibbs sampler, etc.), derived from the generalization from the above paragraph of Gibbs sampler. Recall that R + = {x x R and x > 0}. Recall that the states of S = h n are considered in lexicographic order. Theorem 3.. Let S = h n, h, n. Let w = (h + ) t, 0 t n. Consider on S the probability distribution π = (c 0, c 0 a,..., c 0 a w, c, c a,..., c a w,..., c h, c h a,..., c h a w, c 0, c 0 a,..., c 0 a w, c, c a,..., c a w,..., c h, c h a,..., c h a w,..., c 0, c 0 a,..., c 0 a w, c, c a,..., c a w,..., c h, c h a,..., c h a w ) (the sequence c 0, c 0 a,..., c 0 a w, c, c a,..., c a w,..., c h, c h a,..., c h a w appears (h + ) n t times if 0 t < n and c 0, c 0 a,..., c 0 a w only appears if t = n), where c 0, c,..., c h, a R +. Then, for the Gibbs sampler and, when h =, for the _ optimal hybrid Metropolis chain with the matrices Q, Q,..., Q n such that Q, Q _,..., Q n are identical with Q, Q _,..., Q _ n of the Gibbs sampler on

22 434 Udrea Päun S = n, respectively, we have, using the same notation, P = P P...P n, for the transition matrices of these two chains, P = e π (therefore, the stationarity of these chains is attained at time ). Proof. Since (see the proof of Theorem 3.) = (S), = ( K (0), K (),..., K (h) ), 3 = ( K (0,0), K (0,),..., K (0,h), K (,0), K (,),..., K (,h),..., K (h,0), K (h,),..., K (h,h) ),. n+ = ({x}) x S, we have S = (h + ) n ( S is the cardinal of S), K (0) = K () =... = K (h) = (h + ) n, K (0,0) = K (0,) =... = K (0,h) = K (,0) = K (,) =... = K (,h) = = K (h,0) = K (h,) =... = K (h,h) = (h + ) n,. {x} = (h + ) 0 =, x S. Let l n. Let K l and L l+, L K. Then v, v,..., v l h such that and K = { S if l =, K (v,v,...,v l ) if l, L = K (v,v,...,v l ). Let x = (x, x,..., x n ) K. It follows that x = v, x = v,..., x l = v l (these equations vanish when l = ) and, obviously, x [v l l ] = (x, x,..., x l, v l, x l+,..., x n ) = (v, v,..., v l, v l, x l+,..., x n ) L. Note also that K = (h + ) n l+, L = (h + ) n l, and (P l ) xx[vl l ] > 0. (The reader, if he/she wishes, can use the notation (P l ) x x[vl l ] instead of (P l ) xx[vl l ].)

23 3 G method in action: from exact sampling to approximate one 435 First, we consider the Gibbs sampler. To compute the probabilities (P l ) xx[vl l ], we consider three cases: n l < t; n l = t; n l > t. The case n l < t is a bit more dicult. In this case, the probabilities π x corresponding to the elements x K are, keeping the order, c i a v, c i a v+,..., c i a v+(h+)n l+ for some i h and v w (h + ) n l+ + and those corresponding to the elements x L are, keeping the order, We have c i a v+v l(h+) n l, c i a v+v l(h+) n l +,..., c i a v+(v l+)(h+) n l. c i a v+v l(h+) n l +z h c i a v+s(h+)n l +z s=0 = av l(h+) n l, z h a s(h+)n l s=0 It follows that the rst ratio does not depend on z, z Moreover, it does not depend on c i and v. The others two cases are obvious. We now have (P l ) xx[vl l ] = a v l (h+)b h s=0 c vl h i=0 (h + ) n l. (h + ) n l. a s(h+)b if n l = b for some b t, c i if n l = t, h+ if n l > t. Consequently, P G,, P G, 3,..., P n G n, n+. By Theorems.6,., and 3., P = e π. Second, we consider the optimal hybrid Metropolis chain when h =. In this case, w = t, 0 t n, and π = (c 0, c 0 a,..., c 0 a w, c, c a,..., c a w, c 0, c 0 a,..., c 0 a w, c, c a,..., c a w,..., c 0, c 0 a,..., c 0 a w, c, c a,..., c a w ). As to the positions of positive entries of Q, Q,..., Q n, we have, by hypothesis, Q, Q,..., Q n such that _ Q, _ Q,..., _ Q n are identical with _ Q, _ Q,..., _ Q n of

24 436 Udrea Päun 4 the Gibbs sampler on S = n (see Section and Theorem 3.; see also the second special case from Section 3 in [8]), respectively. It follows that ( min a b, a b) if n l = b for some b t, ( ) f l = min c0 c, c c 0 if n l = t, if n l > t, = x l = f l + r l = f l + = f l + = ( min a b,a b) + min( c0 c, c c0 )+ if n l = b for some b t, if n l = t, if n l > t, and (cases for c 0 and c : c 0 c, c 0 > c (c 0, c R + ); cases for a : a, a > (a R + )) v l +v l a b if n l = b for some b t, +a b (P l ) xx[vl l ] = c vl c 0 +c if n l = t, if n l > t. Note that v l + v l a b = a v l b because v l. It follows that these transition probabilities are identical with those for the Gibbs sampler when h =. This is an interesting thing. See also Theorem 3.6 (the optimal hybrid Metropolis chain is not considered there because of this thing). Proceeding as in the Gibbs sampler case, it follows that P = e π. We call the probability distribution from Theorem 3. the wavy probability distribution (of rst type). Remark 3.3. As to the class of wavy distributions from Theorem 3., the Gibbs sampler is better than the optimal hybrid Metropolis chain when the latter chain has the matrices Q, Q,..., Q n such that _ Q, _ Q,..., _ Q n are identical with _ Q, _ Q,..., _ Q n of the Gibbs sampler on S = h n, respectively. For this see Theorem 3. and the following two examples. () Consider (the probability distribution) π = ( c 0, c 0 a, c 0 a, c, c a, c a, c, c a, c a )

25 5 G method in action: from exact sampling to approximate one 437 on S =, where c 0, c, c, a R +, c i c j, i, j. π is a wavy probability distribution. Suppose, for simplication, that c 0 < c < c. By Theorem 3., for the Gibbs sampler, we have P = e π (P = P P ). It is easy to prove, for the optimal hybrid Metropolis chain, that P e π (P = P P ; we used the same notation for matrices in both cases). () Consider π = (c, ca,..., ca w, c, ca,..., ca w,..., c, ca,..., ca w ) on S = h n, c, a R +, h, n. π is also a wavy probability distribution (the case when c 0 = c =... = c h := c). By Theorem 3., for the Gibbs sampler, we have P = e π (P = P P...P n ). It is easy to prove, for the optimal hybrid Metropolis chain when, e.g., π = ( c, ca,..., ca 8) on S =, c, a R +, a (for a =, π = the uniform probability distribution), that P e π (we also used the same notation for matrices in both cases with the only dierence that P = P P here). By Theorem 3. and Remark 3.3 it is possible that, on h n, the Gibbs sampler or a special generalization of it be the fastest chain in our collection of hybrid Metropolis-Hastings chains. The word fastest refers to Markov chains strictly, not to computers. The running time of our hybrid chains on a computer is another matter (the computational cost per step is the main problem; on a computer, a step of a Markov chain can be performed or not). Example 3.4. Consider the probability distribution π on S = 00 (i.e., on S = {0, } 00 ), where π (0,0,...,0) = d, π x = d 00, x S, x (0, 0,..., 0), where d (0, ) (e.g., d =, or d = 3 4, or d = 9 0 ). Since the sampling from S using the Gibbs sampler or optimal hybrid Metropolis chain can be intractable

26 438 Udrea Päun 6 (on any computer) for some d, one way is breaking of d into many pieces. For this we consider the probability distribution ρ on S = 0, where ρ (0,x) = d 00, ρ (,x) = ( d) , x S ((0, x) and (, x) are vectors from S ). Since ρ (0,x) = π x, x S, x (0, 0,..., 0), it follows that the sampling from S can be performed via the sampling from S. Indeed, letting X be a random variable with the probability distribution π, if, using ρ (on S ), we select a value equal to (0, u) for some u S, u (0, 0,..., 0) S, then we set X = u this value of X is selected according to its probability distribution π on S while, if, using ρ too, we select a value equal to (0, 0,..., 0) S or (, v) for some v S, then we set X = (0, 0,..., 0) (obviously, (0, 0,..., 0) S) this value of X is also selected according to its probability distribution. By Theorem 3. the Gibbs sampler and optimal hybrid Metropolis chain sample exactly (not approximately) from S (equipped with ρ); this implies that the sampling from S (equipped with π) is also exactly. The wavy probability distribution(s) from Theorem 3. has (have) something in common with the geometric distribution. This fact suggests the next application. Application 3.5. To generate, exactly, a random variable with geometric distribution ( p, pq, pq,... ), p, q (0, ), q = p, we can proceed as follows (see, e.g., also [4, p. 500]). We split the geometric distribution into two parts, a tail carrying small probability and a main body of size n, where n, n 0, is suitably chosen. The main body contains the rst n values of geometric distribution and determines the probability distribution where π = ( Zp, Zpq, Zpq,...Zpq n ), Z = q n. We choose the main body with the probability q n (= p + pq pq n ) and the tail with the probability q n. (See also discrete mixtures in, e.g., [4, p. 6], the decomposition method in, e.g., [4, p. 66], and the composition method in, e.g., [4, p. 66].) If the output of choice is the main body, then we can sample exactly (not approximately) from {,,..., n } (equipped with the probability distribution π), using the Gibbs sampler or optimal hybrid Metropolis chain when n, see Theorem 3. (the stationarity is attained at time for each of these chains). Obviously, to use the former or latter chain, we

27 7 G method in action: from exact sampling to approximate one 439 need another distribution, µ we replace the probability distribution π = (π i ) on {,,..., n }, π = Zp, π = Zpq,..., π n = Zpq n, with µ = (µ i ) on n, µ (0,0,...,0) = π, µ (0,0,...,0,) = π, µ (0,0,...,0,,0) = π 3, µ (0,0,...,0,,) = π 4,..., µ (,,...,,0) = π n, µ (,,...,) = π n. Otherwise, i.e., if the output of choice is the tail, we can proceed as follows. Supposing that X is a random variable with the geometric distribution above, i.e., ( p, pq, pq,... ), then, due to the lack-of-memory property of X, X n (X > n here) is a random variable with the same geometric distribution as X, i.e., ( p, pq, pq,... ). Therefore, further, we can work with X n and its probability distribution ( p, pq, pq,... ) (we again split this distribution into two parts, a main body and a tail,...), etc. The case when all the main bodies are of size ( 0 = ) is well-known, see, e.g., [4, p. 498]; we here gave a generalization of this case by Gibbs sampler or optimal hybrid Metropolis chain. The next result says that, sometimes, the Gibbs sampler (in some cases even the optimal hybrid Metropolis chain, see Theorem 3. and its proof and the next result) is identical with its reference chain. Theorem 3.6. Consider on S = h n, h, n, the wavy probability distribution π from Theorem 3.. Consider on S (equipped with π) the Gibbs sampler with transition matrix P = P P...P n and its reference chain with transition matrix R = R R...R n (see Remark.5). Then (P l ) xy = a (l) K,L, l n, K l, L l+ with L K, x K, y L with (P l ) xy > 0 ( l, l n +, are the partitions determined by the Gibbs sampler, see the proof of Theorem 3.), and _ P = R. If, moreover, R l = P _ l, l n, then P l = R l, l n. (Therefore, under all the above conditions, the Gibbs sampler is identical with its reference chain, leaving the initial probability distribution aside.) Proof. First, we show that (P l ) xy = a (l) K,L, l n, K l, L l+ with L K, x K, y L with (P l ) xy > 0. For the Gibbs sampler on S = h n, in the proof of Theorem 3., it was shown that a v l (h+)b if n l = b for some b t, h a s(h+)b s=0 (P l ) xx[vl l ] = c vl if n l = t, h c i i=0 h+ if n l > t.

28 440 Udrea Päun 8 Recall that a (l) K,L = π x x L x K π x, l n, K l, L l+, L K, see the reference method in Section. Recall that K = (h + ) n l+ and L = (h + ) n l, see the proof of Theorem 3.. Case. n l = b for some b t. By proof of Theorem 3. we have Since and a (l) K,L = = we have π x x L x K (h+) b w =0 π x = (h+) b w =0 (h+) b+ w =0 c i a v+v l(h+) b +w c i a v+w = (h+) b w =0 (h+) b+ w =0 a v l(h+) b +w a w a v l(h+) b +w = a v l(h+) b ( + a a (h+)b ) (h+) b+ w =0 a w = ( + a a (h+)b ) + ( ) + a (h+)b + a (h+)b a (h+)b +... ( )... + a h(h+)b + a h(h+)b a (h+)b+ = ( ) ( ) + a a (h+)b + a (h+)b + a a (h+)b +... ( )... + a h(h+)b + a a (h+)b = ( ) ( = + a a (h+)b + a (h+)b a h(h+)b), a (l) K,L = av b l(h+). h a s(h+)b s=0 Case. n l = t. By denition of π (see Theorem 3.) and proof of Theorem 3. we have π x = c vl + c vl a c vl a w = c vl ( + a a w ) x L.

29 9 G method in action: from exact sampling to approximate one 44 and π x = (c 0 + c 0 a c 0 a w ) + (c + c a c a w ) +... x K Consequently,... + (c h + c h a c h a w ) = ( + a a w ) a (l) K,L = c v l. h c i i=0 h c i. Case 3. n l > t. By denition of π and proof of Theorem 3., setting it is easy to see that Consequently, σ L = x L π x, π x = (h + ) σ L. x K a (l) K,L = h +. From Cases 3, we have (P l ) xx[vl l ] = a (l) K,L. Therefore, (P l) xy = a (l) K,L (y = x [v l l ] for some v l h ), l n, K l, L l+ with L K, x K, y L with (P l ) xy > 0. By Theorem.6 and above result we have P = e π. By Theorem.4(iv), R = e π. Therefore, P = R. The other part of conclusion is obvious. In [8], we modied our hybrid (Metropolis-Hastings) chains such that the modied hybrid chains have better upper bounds for p n π (see Theorem.; see also Theorems.8 and.9). Below we present this modication. If P = P P...P t (see Section ) is the transition matrix of a hybrid Metropolis-Hastings chain, we replace the product P s+ P s+ (...P t ( s < t) by the block diagonal s+ -stable matrix (recall that l = K (l),..., K(l), l t +, see Section ) P = P (s) = A (s+) A (s+)... i=0 A (s+) u s+, K(l), u l )

Linear Algebra (part 1) : Vector Spaces (by Evan Dummit, 2017, v. 1.07) 1.1 The Formal Denition of a Vector Space

Linear Algebra (part 1) : Vector Spaces (by Evan Dummit, 2017, v. 1.07) 1.1 The Formal Denition of a Vector Space Linear Algebra (part 1) : Vector Spaces (by Evan Dummit, 2017, v. 1.07) Contents 1 Vector Spaces 1 1.1 The Formal Denition of a Vector Space.................................. 1 1.2 Subspaces...................................................

More information

Intrinsic products and factorizations of matrices

Intrinsic products and factorizations of matrices Available online at www.sciencedirect.com Linear Algebra and its Applications 428 (2008) 5 3 www.elsevier.com/locate/laa Intrinsic products and factorizations of matrices Miroslav Fiedler Academy of Sciences

More information

Linear Algebra (part 1) : Matrices and Systems of Linear Equations (by Evan Dummit, 2016, v. 2.02)

Linear Algebra (part 1) : Matrices and Systems of Linear Equations (by Evan Dummit, 2016, v. 2.02) Linear Algebra (part ) : Matrices and Systems of Linear Equations (by Evan Dummit, 206, v 202) Contents 2 Matrices and Systems of Linear Equations 2 Systems of Linear Equations 2 Elimination, Matrix Formulation

More information

INTRODUCTION TO MARKOV CHAINS AND MARKOV CHAIN MIXING

INTRODUCTION TO MARKOV CHAINS AND MARKOV CHAIN MIXING INTRODUCTION TO MARKOV CHAINS AND MARKOV CHAIN MIXING ERIC SHANG Abstract. This paper provides an introduction to Markov chains and their basic classifications and interesting properties. After establishing

More information

1 Matrices and Systems of Linear Equations

1 Matrices and Systems of Linear Equations Linear Algebra (part ) : Matrices and Systems of Linear Equations (by Evan Dummit, 207, v 260) Contents Matrices and Systems of Linear Equations Systems of Linear Equations Elimination, Matrix Formulation

More information

Markov Chains CK eqns Classes Hitting times Rec./trans. Strong Markov Stat. distr. Reversibility * Markov Chains

Markov Chains CK eqns Classes Hitting times Rec./trans. Strong Markov Stat. distr. Reversibility * Markov Chains Markov Chains A random process X is a family {X t : t T } of random variables indexed by some set T. When T = {0, 1, 2,... } one speaks about a discrete-time process, for T = R or T = [0, ) one has a continuous-time

More information

1 Introduction This work follows a paper by P. Shields [1] concerned with a problem of a relation between the entropy rate of a nite-valued stationary

1 Introduction This work follows a paper by P. Shields [1] concerned with a problem of a relation between the entropy rate of a nite-valued stationary Prexes and the Entropy Rate for Long-Range Sources Ioannis Kontoyiannis Information Systems Laboratory, Electrical Engineering, Stanford University. Yurii M. Suhov Statistical Laboratory, Pure Math. &

More information

8. Statistical Equilibrium and Classification of States: Discrete Time Markov Chains

8. Statistical Equilibrium and Classification of States: Discrete Time Markov Chains 8. Statistical Equilibrium and Classification of States: Discrete Time Markov Chains 8.1 Review 8.2 Statistical Equilibrium 8.3 Two-State Markov Chain 8.4 Existence of P ( ) 8.5 Classification of States

More information

Another algorithm for nonnegative matrices

Another algorithm for nonnegative matrices Linear Algebra and its Applications 365 (2003) 3 12 www.elsevier.com/locate/laa Another algorithm for nonnegative matrices Manfred J. Bauch University of Bayreuth, Institute of Mathematics, D-95440 Bayreuth,

More information

P i [B k ] = lim. n=1 p(n) ii <. n=1. V i :=

P i [B k ] = lim. n=1 p(n) ii <. n=1. V i := 2.7. Recurrence and transience Consider a Markov chain {X n : n N 0 } on state space E with transition matrix P. Definition 2.7.1. A state i E is called recurrent if P i [X n = i for infinitely many n]

More information

290 J.M. Carnicer, J.M. Pe~na basis (u 1 ; : : : ; u n ) consisting of minimally supported elements, yet also has a basis (v 1 ; : : : ; v n ) which f

290 J.M. Carnicer, J.M. Pe~na basis (u 1 ; : : : ; u n ) consisting of minimally supported elements, yet also has a basis (v 1 ; : : : ; v n ) which f Numer. Math. 67: 289{301 (1994) Numerische Mathematik c Springer-Verlag 1994 Electronic Edition Least supported bases and local linear independence J.M. Carnicer, J.M. Pe~na? Departamento de Matematica

More information

Determinants of Partition Matrices

Determinants of Partition Matrices journal of number theory 56, 283297 (1996) article no. 0018 Determinants of Partition Matrices Georg Martin Reinhart Wellesley College Communicated by A. Hildebrand Received February 14, 1994; revised

More information

Lifting to non-integral idempotents

Lifting to non-integral idempotents Journal of Pure and Applied Algebra 162 (2001) 359 366 www.elsevier.com/locate/jpaa Lifting to non-integral idempotents Georey R. Robinson School of Mathematics and Statistics, University of Birmingham,

More information

A TOUR OF LINEAR ALGEBRA FOR JDEP 384H

A TOUR OF LINEAR ALGEBRA FOR JDEP 384H A TOUR OF LINEAR ALGEBRA FOR JDEP 384H Contents Solving Systems 1 Matrix Arithmetic 3 The Basic Rules of Matrix Arithmetic 4 Norms and Dot Products 5 Norms 5 Dot Products 6 Linear Programming 7 Eigenvectors

More information

Lecture 2 INF-MAT : A boundary value problem and an eigenvalue problem; Block Multiplication; Tridiagonal Systems

Lecture 2 INF-MAT : A boundary value problem and an eigenvalue problem; Block Multiplication; Tridiagonal Systems Lecture 2 INF-MAT 4350 2008: A boundary value problem and an eigenvalue problem; Block Multiplication; Tridiagonal Systems Tom Lyche Centre of Mathematics for Applications, Department of Informatics, University

More information

The Inclusion Exclusion Principle and Its More General Version

The Inclusion Exclusion Principle and Its More General Version The Inclusion Exclusion Principle and Its More General Version Stewart Weiss June 28, 2009 1 Introduction The Inclusion-Exclusion Principle is typically seen in the context of combinatorics or probability

More information

Lecture 11: Introduction to Markov Chains. Copyright G. Caire (Sample Lectures) 321

Lecture 11: Introduction to Markov Chains. Copyright G. Caire (Sample Lectures) 321 Lecture 11: Introduction to Markov Chains Copyright G. Caire (Sample Lectures) 321 Discrete-time random processes A sequence of RVs indexed by a variable n 2 {0, 1, 2,...} forms a discretetime random process

More information

Calculus and linear algebra for biomedical engineering Week 3: Matrices, linear systems of equations, and the Gauss algorithm

Calculus and linear algebra for biomedical engineering Week 3: Matrices, linear systems of equations, and the Gauss algorithm Calculus and linear algebra for biomedical engineering Week 3: Matrices, linear systems of equations, and the Gauss algorithm Hartmut Führ fuehr@matha.rwth-aachen.de Lehrstuhl A für Mathematik, RWTH Aachen

More information

Math 42, Discrete Mathematics

Math 42, Discrete Mathematics c Fall 2018 last updated 12/05/2018 at 15:47:21 For use by students in this class only; all rights reserved. Note: some prose & some tables are taken directly from Kenneth R. Rosen, and Its Applications,

More information

Operations On Networks Of Discrete And Generalized Conductors

Operations On Networks Of Discrete And Generalized Conductors Operations On Networks Of Discrete And Generalized Conductors Kevin Rosema e-mail: bigyouth@hardy.u.washington.edu 8/18/92 1 Introduction The most basic unit of transaction will be the discrete conductor.

More information

Lecture 3: Markov chains.

Lecture 3: Markov chains. 1 BIOINFORMATIK II PROBABILITY & STATISTICS Summer semester 2008 The University of Zürich and ETH Zürich Lecture 3: Markov chains. Prof. Andrew Barbour Dr. Nicolas Pétrélis Adapted from a course by Dr.

More information

Markov Chain Monte Carlo

Markov Chain Monte Carlo Chapter 5 Markov Chain Monte Carlo MCMC is a kind of improvement of the Monte Carlo method By sampling from a Markov chain whose stationary distribution is the desired sampling distributuion, it is possible

More information

Math 471 (Numerical methods) Chapter 3 (second half). System of equations

Math 471 (Numerical methods) Chapter 3 (second half). System of equations Math 47 (Numerical methods) Chapter 3 (second half). System of equations Overlap 3.5 3.8 of Bradie 3.5 LU factorization w/o pivoting. Motivation: ( ) A I Gaussian Elimination (U L ) where U is upper triangular

More information

Discrete Math, Spring Solutions to Problems V

Discrete Math, Spring Solutions to Problems V Discrete Math, Spring 202 - Solutions to Problems V Suppose we have statements P, P 2, P 3,, one for each natural number In other words, we have the collection or set of statements {P n n N} a Suppose

More information

Notes on the Matrix-Tree theorem and Cayley s tree enumerator

Notes on the Matrix-Tree theorem and Cayley s tree enumerator Notes on the Matrix-Tree theorem and Cayley s tree enumerator 1 Cayley s tree enumerator Recall that the degree of a vertex in a tree (or in any graph) is the number of edges emanating from it We will

More information

Linear Algebra, 4th day, Thursday 7/1/04 REU Info:

Linear Algebra, 4th day, Thursday 7/1/04 REU Info: Linear Algebra, 4th day, Thursday 7/1/04 REU 004. Info http//people.cs.uchicago.edu/laci/reu04. Instructor Laszlo Babai Scribe Nick Gurski 1 Linear maps We shall study the notion of maps between vector

More information

Notes on the matrix exponential

Notes on the matrix exponential Notes on the matrix exponential Erik Wahlén erik.wahlen@math.lu.se February 14, 212 1 Introduction The purpose of these notes is to describe how one can compute the matrix exponential e A when A is not

More information

MARKOV CHAIN MONTE CARLO

MARKOV CHAIN MONTE CARLO MARKOV CHAIN MONTE CARLO RYAN WANG Abstract. This paper gives a brief introduction to Markov Chain Monte Carlo methods, which offer a general framework for calculating difficult integrals. We start with

More information

Definition: A binary relation R from a set A to a set B is a subset R A B. Example:

Definition: A binary relation R from a set A to a set B is a subset R A B. Example: Chapter 9 1 Binary Relations Definition: A binary relation R from a set A to a set B is a subset R A B. Example: Let A = {0,1,2} and B = {a,b} {(0, a), (0, b), (1,a), (2, b)} is a relation from A to B.

More information

Balance properties of multi-dimensional words

Balance properties of multi-dimensional words Theoretical Computer Science 273 (2002) 197 224 www.elsevier.com/locate/tcs Balance properties of multi-dimensional words Valerie Berthe a;, Robert Tijdeman b a Institut de Mathematiques de Luminy, CNRS-UPR

More information

INTRODUCTION TO MARKOV CHAIN MONTE CARLO

INTRODUCTION TO MARKOV CHAIN MONTE CARLO INTRODUCTION TO MARKOV CHAIN MONTE CARLO 1. Introduction: MCMC In its simplest incarnation, the Monte Carlo method is nothing more than a computerbased exploitation of the Law of Large Numbers to estimate

More information

Lecture 8: Determinants I

Lecture 8: Determinants I 8-1 MATH 1B03/1ZC3 Winter 2019 Lecture 8: Determinants I Instructor: Dr Rushworth January 29th Determinants via cofactor expansion (from Chapter 2.1 of Anton-Rorres) Matrices encode information. Often

More information

Detailed Proof of The PerronFrobenius Theorem

Detailed Proof of The PerronFrobenius Theorem Detailed Proof of The PerronFrobenius Theorem Arseny M Shur Ural Federal University October 30, 2016 1 Introduction This famous theorem has numerous applications, but to apply it you should understand

More information

1 GSW Sets of Systems

1 GSW Sets of Systems 1 Often, we have to solve a whole series of sets of simultaneous equations of the form y Ax, all of which have the same matrix A, but each of which has a different known vector y, and a different unknown

More information

Summary: A Random Walks View of Spectral Segmentation, by Marina Meila (University of Washington) and Jianbo Shi (Carnegie Mellon University)

Summary: A Random Walks View of Spectral Segmentation, by Marina Meila (University of Washington) and Jianbo Shi (Carnegie Mellon University) Summary: A Random Walks View of Spectral Segmentation, by Marina Meila (University of Washington) and Jianbo Shi (Carnegie Mellon University) The authors explain how the NCut algorithm for graph bisection

More information

Polynomial functions over nite commutative rings

Polynomial functions over nite commutative rings Polynomial functions over nite commutative rings Balázs Bulyovszky a, Gábor Horváth a, a Institute of Mathematics, University of Debrecen, Pf. 400, Debrecen, 4002, Hungary Abstract We prove a necessary

More information

Coins with arbitrary weights. Abstract. Given a set of m coins out of a collection of coins of k unknown distinct weights, we wish to

Coins with arbitrary weights. Abstract. Given a set of m coins out of a collection of coins of k unknown distinct weights, we wish to Coins with arbitrary weights Noga Alon Dmitry N. Kozlov y Abstract Given a set of m coins out of a collection of coins of k unknown distinct weights, we wish to decide if all the m given coins have the

More information

Linear algebra. S. Richard

Linear algebra. S. Richard Linear algebra S. Richard Fall Semester 2014 and Spring Semester 2015 2 Contents Introduction 5 0.1 Motivation.................................. 5 1 Geometric setting 7 1.1 The Euclidean space R n..........................

More information

The matrix approach for abstract argumentation frameworks

The matrix approach for abstract argumentation frameworks The matrix approach for abstract argumentation frameworks Claudette CAYROL, Yuming XU IRIT Report RR- -2015-01- -FR February 2015 Abstract The matrices and the operation of dual interchange are introduced

More information

Computer Vision Group Prof. Daniel Cremers. 11. Sampling Methods: Markov Chain Monte Carlo

Computer Vision Group Prof. Daniel Cremers. 11. Sampling Methods: Markov Chain Monte Carlo Group Prof. Daniel Cremers 11. Sampling Methods: Markov Chain Monte Carlo Markov Chain Monte Carlo In high-dimensional spaces, rejection sampling and importance sampling are very inefficient An alternative

More information

Math 443/543 Graph Theory Notes 5: Graphs as matrices, spectral graph theory, and PageRank

Math 443/543 Graph Theory Notes 5: Graphs as matrices, spectral graph theory, and PageRank Math 443/543 Graph Theory Notes 5: Graphs as matrices, spectral graph theory, and PageRank David Glickenstein November 3, 4 Representing graphs as matrices It will sometimes be useful to represent graphs

More information

Section Summary. Relations and Functions Properties of Relations. Combining Relations

Section Summary. Relations and Functions Properties of Relations. Combining Relations Chapter 9 Chapter Summary Relations and Their Properties n-ary Relations and Their Applications (not currently included in overheads) Representing Relations Closures of Relations (not currently included

More information

Reduction of two-loop Feynman integrals. Rob Verheyen

Reduction of two-loop Feynman integrals. Rob Verheyen Reduction of two-loop Feynman integrals Rob Verheyen July 3, 2012 Contents 1 The Fundamentals at One Loop 2 1.1 Introduction.............................. 2 1.2 Reducing the One-loop Case.....................

More information

Foundations of Matrix Analysis

Foundations of Matrix Analysis 1 Foundations of Matrix Analysis In this chapter we recall the basic elements of linear algebra which will be employed in the remainder of the text For most of the proofs as well as for the details, the

More information

The Degree of the Splitting Field of a Random Polynomial over a Finite Field

The Degree of the Splitting Field of a Random Polynomial over a Finite Field The Degree of the Splitting Field of a Random Polynomial over a Finite Field John D. Dixon and Daniel Panario School of Mathematics and Statistics Carleton University, Ottawa, Canada fjdixon,danielg@math.carleton.ca

More information

On asymptotic behavior of a finite Markov chain

On asymptotic behavior of a finite Markov chain 1 On asymptotic behavior of a finite Markov chain Alina Nicolae Department of Mathematical Analysis Probability. University Transilvania of Braşov. Romania. Keywords: convergence, weak ergodicity, strong

More information

Institute for Advanced Computer Studies. Department of Computer Science. On Markov Chains with Sluggish Transients. G. W. Stewart y.

Institute for Advanced Computer Studies. Department of Computer Science. On Markov Chains with Sluggish Transients. G. W. Stewart y. University of Maryland Institute for Advanced Computer Studies Department of Computer Science College Park TR{94{77 TR{3306 On Markov Chains with Sluggish Transients G. W. Stewart y June, 994 ABSTRACT

More information

Key words. Feedback shift registers, Markov chains, stochastic matrices, rapid mixing

Key words. Feedback shift registers, Markov chains, stochastic matrices, rapid mixing MIXING PROPERTIES OF TRIANGULAR FEEDBACK SHIFT REGISTERS BERND SCHOMBURG Abstract. The purpose of this note is to show that Markov chains induced by non-singular triangular feedback shift registers and

More information

Homework 1 Solutions ECEn 670, Fall 2013

Homework 1 Solutions ECEn 670, Fall 2013 Homework Solutions ECEn 670, Fall 03 A.. Use the rst seven relations to prove relations (A.0, (A.3, and (A.6. Prove (F G c F c G c (A.0. (F G c ((F c G c c c by A.6. (F G c F c G c by A.4 Prove F (F G

More information

The super line graph L 2

The super line graph L 2 Discrete Mathematics 206 (1999) 51 61 www.elsevier.com/locate/disc The super line graph L 2 Jay S. Bagga a;, Lowell W. Beineke b, Badri N. Varma c a Department of Computer Science, College of Science and

More information

4.1 Eigenvalues, Eigenvectors, and The Characteristic Polynomial

4.1 Eigenvalues, Eigenvectors, and The Characteristic Polynomial Linear Algebra (part 4): Eigenvalues, Diagonalization, and the Jordan Form (by Evan Dummit, 27, v ) Contents 4 Eigenvalues, Diagonalization, and the Jordan Canonical Form 4 Eigenvalues, Eigenvectors, and

More information

RESEARCH ARTICLE. An extension of the polytope of doubly stochastic matrices

RESEARCH ARTICLE. An extension of the polytope of doubly stochastic matrices Linear and Multilinear Algebra Vol. 00, No. 00, Month 200x, 1 15 RESEARCH ARTICLE An extension of the polytope of doubly stochastic matrices Richard A. Brualdi a and Geir Dahl b a Department of Mathematics,

More information

Pade approximants and noise: rational functions

Pade approximants and noise: rational functions Journal of Computational and Applied Mathematics 105 (1999) 285 297 Pade approximants and noise: rational functions Jacek Gilewicz a; a; b;1, Maciej Pindor a Centre de Physique Theorique, Unite Propre

More information

An O(n 2 ) algorithm for maximum cycle mean of Monge matrices in max-algebra

An O(n 2 ) algorithm for maximum cycle mean of Monge matrices in max-algebra Discrete Applied Mathematics 127 (2003) 651 656 Short Note www.elsevier.com/locate/dam An O(n 2 ) algorithm for maximum cycle mean of Monge matrices in max-algebra Martin Gavalec a;,jan Plavka b a Department

More information

Markov Random Fields

Markov Random Fields Markov Random Fields 1. Markov property The Markov property of a stochastic sequence {X n } n 0 implies that for all n 1, X n is independent of (X k : k / {n 1, n, n + 1}), given (X n 1, X n+1 ). Another

More information

5 Eigenvalues and Diagonalization

5 Eigenvalues and Diagonalization Linear Algebra (part 5): Eigenvalues and Diagonalization (by Evan Dummit, 27, v 5) Contents 5 Eigenvalues and Diagonalization 5 Eigenvalues, Eigenvectors, and The Characteristic Polynomial 5 Eigenvalues

More information

Nordhaus-Gaddum Theorems for k-decompositions

Nordhaus-Gaddum Theorems for k-decompositions Nordhaus-Gaddum Theorems for k-decompositions Western Michigan University October 12, 2011 A Motivating Problem Consider the following problem. An international round-robin sports tournament is held between

More information

Dot Products, Transposes, and Orthogonal Projections

Dot Products, Transposes, and Orthogonal Projections Dot Products, Transposes, and Orthogonal Projections David Jekel November 13, 2015 Properties of Dot Products Recall that the dot product or standard inner product on R n is given by x y = x 1 y 1 + +

More information

1. Affine Grassmannian for G a. Gr Ga = lim A n. Intuition. First some intuition. We always have to rst approximation

1. Affine Grassmannian for G a. Gr Ga = lim A n. Intuition. First some intuition. We always have to rst approximation PROBLEM SESSION I: THE AFFINE GRASSMANNIAN TONY FENG In this problem we are proving: 1 Affine Grassmannian for G a Gr Ga = lim A n n with A n A n+1 being the inclusion of a hyperplane in the obvious way

More information

Kernels of Directed Graph Laplacians. J. S. Caughman and J.J.P. Veerman

Kernels of Directed Graph Laplacians. J. S. Caughman and J.J.P. Veerman Kernels of Directed Graph Laplacians J. S. Caughman and J.J.P. Veerman Department of Mathematics and Statistics Portland State University PO Box 751, Portland, OR 97207. caughman@pdx.edu, veerman@pdx.edu

More information

Solution Set 7, Fall '12

Solution Set 7, Fall '12 Solution Set 7, 18.06 Fall '12 1. Do Problem 26 from 5.1. (It might take a while but when you see it, it's easy) Solution. Let n 3, and let A be an n n matrix whose i, j entry is i + j. To show that det

More information

Roots of Unity, Cyclotomic Polynomials and Applications

Roots of Unity, Cyclotomic Polynomials and Applications Swiss Mathematical Olympiad smo osm Roots of Unity, Cyclotomic Polynomials and Applications The task to be done here is to give an introduction to the topics in the title. This paper is neither complete

More information

STA 294: Stochastic Processes & Bayesian Nonparametrics

STA 294: Stochastic Processes & Bayesian Nonparametrics MARKOV CHAINS AND CONVERGENCE CONCEPTS Markov chains are among the simplest stochastic processes, just one step beyond iid sequences of random variables. Traditionally they ve been used in modelling a

More information

Economics 472. Lecture 10. where we will refer to y t as a m-vector of endogenous variables, x t as a q-vector of exogenous variables,

Economics 472. Lecture 10. where we will refer to y t as a m-vector of endogenous variables, x t as a q-vector of exogenous variables, University of Illinois Fall 998 Department of Economics Roger Koenker Economics 472 Lecture Introduction to Dynamic Simultaneous Equation Models In this lecture we will introduce some simple dynamic simultaneous

More information

6 Markov Chain Monte Carlo (MCMC)

6 Markov Chain Monte Carlo (MCMC) 6 Markov Chain Monte Carlo (MCMC) The underlying idea in MCMC is to replace the iid samples of basic MC methods, with dependent samples from an ergodic Markov chain, whose limiting (stationary) distribution

More information

Discrete Applied Mathematics

Discrete Applied Mathematics Discrete Applied Mathematics 194 (015) 37 59 Contents lists available at ScienceDirect Discrete Applied Mathematics journal homepage: wwwelseviercom/locate/dam Loopy, Hankel, and combinatorially skew-hankel

More information

k-degenerate Graphs Allan Bickle Date Western Michigan University

k-degenerate Graphs Allan Bickle Date Western Michigan University k-degenerate Graphs Western Michigan University Date Basics Denition The k-core of a graph G is the maximal induced subgraph H G such that δ (H) k. The core number of a vertex, C (v), is the largest value

More information

ACI-matrices all of whose completions have the same rank

ACI-matrices all of whose completions have the same rank ACI-matrices all of whose completions have the same rank Zejun Huang, Xingzhi Zhan Department of Mathematics East China Normal University Shanghai 200241, China Abstract We characterize the ACI-matrices

More information

Homework 10 Solution

Homework 10 Solution CS 174: Combinatorics and Discrete Probability Fall 2012 Homewor 10 Solution Problem 1. (Exercise 10.6 from MU 8 points) The problem of counting the number of solutions to a napsac instance can be defined

More information

(1) A frac = b : a, b A, b 0. We can define addition and multiplication of fractions as we normally would. a b + c d

(1) A frac = b : a, b A, b 0. We can define addition and multiplication of fractions as we normally would. a b + c d The Algebraic Method 0.1. Integral Domains. Emmy Noether and others quickly realized that the classical algebraic number theory of Dedekind could be abstracted completely. In particular, rings of integers

More information

STGs may contain redundant states, i.e. states whose. State minimization is the transformation of a given

STGs may contain redundant states, i.e. states whose. State minimization is the transformation of a given Completely Specied Machines STGs may contain redundant states, i.e. states whose function can be accomplished by other states. State minimization is the transformation of a given machine into an equivalent

More information

Groups. 3.1 Definition of a Group. Introduction. Definition 3.1 Group

Groups. 3.1 Definition of a Group. Introduction. Definition 3.1 Group C H A P T E R t h r e E Groups Introduction Some of the standard topics in elementary group theory are treated in this chapter: subgroups, cyclic groups, isomorphisms, and homomorphisms. In the development

More information

Contents. 2.1 Vectors in R n. Linear Algebra (part 2) : Vector Spaces (by Evan Dummit, 2017, v. 2.50) 2 Vector Spaces

Contents. 2.1 Vectors in R n. Linear Algebra (part 2) : Vector Spaces (by Evan Dummit, 2017, v. 2.50) 2 Vector Spaces Linear Algebra (part 2) : Vector Spaces (by Evan Dummit, 2017, v 250) Contents 2 Vector Spaces 1 21 Vectors in R n 1 22 The Formal Denition of a Vector Space 4 23 Subspaces 6 24 Linear Combinations and

More information

A PRIMER ON SESQUILINEAR FORMS

A PRIMER ON SESQUILINEAR FORMS A PRIMER ON SESQUILINEAR FORMS BRIAN OSSERMAN This is an alternative presentation of most of the material from 8., 8.2, 8.3, 8.4, 8.5 and 8.8 of Artin s book. Any terminology (such as sesquilinear form

More information

MATRICES. a m,1 a m,n A =

MATRICES. a m,1 a m,n A = MATRICES Matrices are rectangular arrays of real or complex numbers With them, we define arithmetic operations that are generalizations of those for real and complex numbers The general form a matrix of

More information

Average Reward Parameters

Average Reward Parameters Simulation-Based Optimization of Markov Reward Processes: Implementation Issues Peter Marbach 2 John N. Tsitsiklis 3 Abstract We consider discrete time, nite state space Markov reward processes which depend

More information

Linear Algebra in Actuarial Science: Slides to the lecture

Linear Algebra in Actuarial Science: Slides to the lecture Linear Algebra in Actuarial Science: Slides to the lecture Fall Semester 2010/2011 Linear Algebra is a Tool-Box Linear Equation Systems Discretization of differential equations: solving linear equations

More information

MATH 61-02: PRACTICE PROBLEMS FOR FINAL EXAM

MATH 61-02: PRACTICE PROBLEMS FOR FINAL EXAM MATH 61-02: PRACTICE PROBLEMS FOR FINAL EXAM (FP1) The exclusive or operation, denoted by and sometimes known as XOR, is defined so that P Q is true iff P is true or Q is true, but not both. Prove (through

More information

Boolean Inner-Product Spaces and Boolean Matrices

Boolean Inner-Product Spaces and Boolean Matrices Boolean Inner-Product Spaces and Boolean Matrices Stan Gudder Department of Mathematics, University of Denver, Denver CO 80208 Frédéric Latrémolière Department of Mathematics, University of Denver, Denver

More information

Combinations. April 12, 2006

Combinations. April 12, 2006 Combinations April 12, 2006 Combinations, April 12, 2006 Binomial Coecients Denition. The number of distinct subsets with j elements that can be chosen from a set with n elements is denoted by ( n j).

More information

ELA

ELA SUBDOMINANT EIGENVALUES FOR STOCHASTIC MATRICES WITH GIVEN COLUMN SUMS STEVE KIRKLAND Abstract For any stochastic matrix A of order n, denote its eigenvalues as λ 1 (A),,λ n(a), ordered so that 1 = λ 1

More information

A fast algorithm to generate necklaces with xed content

A fast algorithm to generate necklaces with xed content Theoretical Computer Science 301 (003) 477 489 www.elsevier.com/locate/tcs Note A fast algorithm to generate necklaces with xed content Joe Sawada 1 Department of Computer Science, University of Toronto,

More information

1. The Polar Decomposition

1. The Polar Decomposition A PERSONAL INTERVIEW WITH THE SINGULAR VALUE DECOMPOSITION MATAN GAVISH Part. Theory. The Polar Decomposition In what follows, F denotes either R or C. The vector space F n is an inner product space with

More information

SUMS PROBLEM COMPETITION, 2000

SUMS PROBLEM COMPETITION, 2000 SUMS ROBLEM COMETITION, 2000 SOLUTIONS 1 The result is well known, and called Morley s Theorem Many proofs are known See for example HSM Coxeter, Introduction to Geometry, page 23 2 If the number of vertices,

More information

Chapter 3. Differentiable Mappings. 1. Differentiable Mappings

Chapter 3. Differentiable Mappings. 1. Differentiable Mappings Chapter 3 Differentiable Mappings 1 Differentiable Mappings Let V and W be two linear spaces over IR A mapping L from V to W is called a linear mapping if L(u + v) = Lu + Lv for all u, v V and L(λv) =

More information

Canonical lossless state-space systems: staircase forms and the Schur algorithm

Canonical lossless state-space systems: staircase forms and the Schur algorithm Canonical lossless state-space systems: staircase forms and the Schur algorithm Ralf L.M. Peeters Bernard Hanzon Martine Olivi Dept. Mathematics School of Mathematical Sciences Projet APICS Universiteit

More information

DS-GA 1002 Lecture notes 0 Fall Linear Algebra. These notes provide a review of basic concepts in linear algebra.

DS-GA 1002 Lecture notes 0 Fall Linear Algebra. These notes provide a review of basic concepts in linear algebra. DS-GA 1002 Lecture notes 0 Fall 2016 Linear Algebra These notes provide a review of basic concepts in linear algebra. 1 Vector spaces You are no doubt familiar with vectors in R 2 or R 3, i.e. [ ] 1.1

More information

A Generalized Eigenmode Algorithm for Reducible Regular Matrices over the Max-Plus Algebra

A Generalized Eigenmode Algorithm for Reducible Regular Matrices over the Max-Plus Algebra International Mathematical Forum, 4, 2009, no. 24, 1157-1171 A Generalized Eigenmode Algorithm for Reducible Regular Matrices over the Max-Plus Algebra Zvi Retchkiman Königsberg Instituto Politécnico Nacional,

More information

MATH 56A: STOCHASTIC PROCESSES CHAPTER 1

MATH 56A: STOCHASTIC PROCESSES CHAPTER 1 MATH 56A: STOCHASTIC PROCESSES CHAPTER. Finite Markov chains For the sake of completeness of these notes I decided to write a summary of the basic concepts of finite Markov chains. The topics in this chapter

More information

ELA

ELA Volume 16, pp 171-182, July 2007 http://mathtechnionacil/iic/ela SUBDIRECT SUMS OF DOUBLY DIAGONALLY DOMINANT MATRICES YAN ZHU AND TING-ZHU HUANG Abstract The problem of when the k-subdirect sum of a doubly

More information

On Projective Planes

On Projective Planes C-UPPSATS 2002:02 TFM, Mid Sweden University 851 70 Sundsvall Tel: 060-14 86 00 On Projective Planes 1 2 7 4 3 6 5 The Fano plane, the smallest projective plane. By Johan Kåhrström ii iii Abstract It was

More information

Existence of Some Signed Magic Arrays

Existence of Some Signed Magic Arrays Existence of Some Signed Magic Arrays arxiv:1701.01649v1 [math.co] 6 Jan 017 Abdollah Khodkar Department of Mathematics University of West Georgia Carrollton, GA 30118 akhodkar@westga.edu Christian Schulz

More information

Simple Lie subalgebras of locally nite associative algebras

Simple Lie subalgebras of locally nite associative algebras Simple Lie subalgebras of locally nite associative algebras Y.A. Bahturin Department of Mathematics and Statistics Memorial University of Newfoundland St. John's, NL, A1C5S7, Canada A.A. Baranov Department

More information

CHANGE OF BASIS AND ALL OF THAT

CHANGE OF BASIS AND ALL OF THAT CHANGE OF BASIS AND ALL OF THAT LANCE D DRAGER Introduction The goal of these notes is to provide an apparatus for dealing with change of basis in vector spaces, matrices of linear transformations, and

More information

Contents. 6 Systems of First-Order Linear Dierential Equations. 6.1 General Theory of (First-Order) Linear Systems

Contents. 6 Systems of First-Order Linear Dierential Equations. 6.1 General Theory of (First-Order) Linear Systems Dierential Equations (part 3): Systems of First-Order Dierential Equations (by Evan Dummit, 26, v 2) Contents 6 Systems of First-Order Linear Dierential Equations 6 General Theory of (First-Order) Linear

More information

A proof of the Jordan normal form theorem

A proof of the Jordan normal form theorem A proof of the Jordan normal form theorem Jordan normal form theorem states that any matrix is similar to a blockdiagonal matrix with Jordan blocks on the diagonal. To prove it, we first reformulate it

More information

Institute for Advanced Computer Studies. Department of Computer Science. On the Convergence of. Multipoint Iterations. G. W. Stewart y.

Institute for Advanced Computer Studies. Department of Computer Science. On the Convergence of. Multipoint Iterations. G. W. Stewart y. University of Maryland Institute for Advanced Computer Studies Department of Computer Science College Park TR{93{10 TR{3030 On the Convergence of Multipoint Iterations G. W. Stewart y February, 1993 Reviseed,

More information

On the classication of algebras

On the classication of algebras Technische Universität Carolo-Wilhelmina Braunschweig Institut Computational Mathematics On the classication of algebras Morten Wesche September 19, 2016 Introduction Higman (1950) published the papers

More information

Example: physical systems. If the state space. Example: speech recognition. Context can be. Example: epidemics. Suppose each infected

Example: physical systems. If the state space. Example: speech recognition. Context can be. Example: epidemics. Suppose each infected 4. Markov Chains A discrete time process {X n,n = 0,1,2,...} with discrete state space X n {0,1,2,...} is a Markov chain if it has the Markov property: P[X n+1 =j X n =i,x n 1 =i n 1,...,X 0 =i 0 ] = P[X

More information

[3] (b) Find a reduced row-echelon matrix row-equivalent to ,1 2 2

[3] (b) Find a reduced row-echelon matrix row-equivalent to ,1 2 2 MATH Key for sample nal exam, August 998 []. (a) Dene the term \reduced row-echelon matrix". A matrix is reduced row-echelon if the following conditions are satised. every zero row lies below every nonzero

More information