G METHOD IN ACTION: FROM EXACT SAMPLING TO APPROXIMATE ONE
|
|
- Gary Barnett
- 5 years ago
- Views:
Transcription
1 G METHOD IN ACTION: FROM EXACT SAMPLING TO APPROXIMATE ONE UDREA PÄUN Communicated by Marius Iosifescu The main contribution of this work is the unication, by G method using Markov chains, therefore, a Markovian unication, in the nite case, of ve (even six) sampling methods from exact and approximate sampling theory. This unication is in conjunction with a main problem, our problem of interest: nding the fastest Markov chains in sampling theory based on Metropolis-Hastings chains and their derivatives. We show, in the nite case, that the cyclic Gibbs sampler the Gibbs sampler for short belongs to our collection of hybrid Metropolis-Hastings chains from [U. Päun, A hybrid Metropolis-Hastings chain, Rev. Roumaine Math. Pures Appl. 56 (0), 07 8]. So, we obtain, for any type of Gibbs sampler (cyclic, random, etc.), in the nite case, the structure of matrices corresponding to the coordinate updates. Concerning our hybrid Metropolis-Hastings chains, to do unications, and, as a result of these, to do comparisons and improvements, we construct, by G method, a chain, which we call the reference chain. This is a very fast chain because it attains its stationarity at time. Moreover, the reference chain is the best one we can construct (concerning our hybrid chains), see Uniqueness Theorem. The reference chain is constructed such that it can do what a exact sampling method, which we called the reference method, does. This method contains, as special cases, the alias method and swapping method. Coming back to the reference chain, this is sometimes identical with the Gibbs sampler or with a special Gibbs sampler with grouped coordinates we illustrate this case for two classes of wavy probability distributions. As a result of these facts, we give a method for generating, exactly (not approximately), a random variable with geometric distribution. Finally, we state the fundamental idea on the speed of convergence: the nearer our hybrid chain is of its reference chain, the faster our hybrid chain is. The addendum shows that we are on the right track. AMS 00 Subject Classication: 60J0, 65C05, 65C0, 68U0. Key words: G method, unication, exact sampling, alias method, swapping method, reference method, reference chain, approximate sampling, hybrid Metropolis-Hastings chain, optimal hybrid Metropolis chain, Gibbs sampler, wavy probability distribution, uniqueness theorem, fundamental idea on the speed of convergence, G comparison. REV. ROUMAINE MATH. PURES APPL. 6 (07), 3, 4345
2 44 Udrea Päun. SOME BASIC THINGS In this section, we present some basic things (notation, notions, and results) from [78] with completions. [7] refers, especially, to very fast Markov chains (i.e., Markov chains which converge very fast) while in [8], based on [7] (Example. was the starting point), is constructed a collection of hybrid Metropolis-Hastings chains. This collection of Markov chains has something in common this is interesting! with some exact sampling methods (see the next sections), one of these methods, the swapping method, suggesting, in fact, the construction of this collection (in Example. from [7] mentioned above, it is a chain which can do what the swapping method does). These common things could help us to nd fast approximate sampling methods or, at best, fast exact sampling methods (see the fundamental idea on the speed of convergence, etc.). This way based on exact sampling methods, to nd ecient exact or approximate sampling methods, is our way the best way. On the other hand, the exact sampling methods are important sources to obtain very fast Markov chains. Set Par (E) = { is a partition of E }, where E is a nonempty set. We shall agree that the partitions do not contain the empty set. Denition.. Let, Par(E). We say that is ner than if V, W such that V W. Write when is ner than. In this article, a vector is a row vector and a stochastic matrix is a row stochastic matrix. The entry (i, j) of a matrix Z will be denoted Z ij or, if confusion can arise, Z i j. Set m = {,,..., m} (m ), m = {0,,..., m} (m 0), N m,n = {P P is a nonnegative m n matrix}, S m,n = {P P is a stochastic m n matrix}, N n = N n,n, S n = S n,n. Let P = (P ij ) N m,n. Let = U m and = V n. Set the matrices P U = (P ij ) i U,j n, P V = (P ij ) i m,j V, and P V U = (P ij ) i U,j V
3 3 G method in action: from exact sampling to approximate one 45 (e.g., if then, e.g., E.g., Set P = ( 3 4 P {} = ( 3 4 ), P {3} = ( 4 ), ), and P {} {} = () ). ({i}) i {s,s,...,s t} = ({s }, {s },..., {s t }) ; ({i}) i {s,s,...,s t} Par ({s, s,..., s t }). ({i}) i n = ({}, {},..., {n}). Denition.. Let P N m,n. We say that P is a generalized stochastic matrix if a 0, Q S m,n such that P = aq. Denition.3 ([7]). Let P N m,n. Let Par( m ) and Σ Par( n ). We say that P is a [ ]-stable matrix on Σ if PK L is a generalized stochastic matrix, K, L Σ. In particular, a [ ]-stable matrix on ({i}) i n is called [ ]-stable for short. Denition.4 ([7]). Let P N m,n. Let Par( m ) and Σ Par( n ). We say that P is a -stable matrix on Σ if is the least ne partition for which P is a [ ]-stable matrix on Σ. In particular, a -stable matrix on ({i}) i n is called -stable while a ( m )-stable matrix on Σ is called stable on Σ for short. A stable matrix on ({i}) i n is called stable for short. For interesting examples of -stable matrices on Σ for some and Σ, see Sections and 3. Let Par( m ) and Par( n ). Set (see [7] for G, and [8] for _ G, ) and G, = {P P S m,n and P is a [ ] -stable matrix on } _ G, = {P P N m,n and P is a [ ] -stable matrix on }. When we study or even when we construct products of nonnegative matrices (in particular, products of stochastic matrices) using G, or _ G, we shall refer this as the G method. Let _ P G,. Let K and L. Then a K,L 0, Q K,L S K, L such that PK L = a K,LQ K,L. Set P + = ( P + KL )K,L, P + KL = a K,L, K, L
4 46 Udrea Päun 4 (P + KL, K, L, are the entries of matrix P + ). If confusion can arise, we write P +(, ) instead of P +. In this article, when we work with the operator ( ) + = ( ) + (, ), we suppose, for labeling the rows and columns of matrices, that and are ordered sets (i.e., these are sets where the order in which we write their elements counts), even if we omit to specify this. E.g., let P = P G,, where = ({, }, {3}) and = ({, }, {3, 4}). Further, we have ) P + = P +(, ) = ( ({, } and {3} are the rst and the second element of, respectively; based on this order, the rst and the second row of P + are labeled {, } and {3}, respectively. The columns of P + are labeled similarly.) Below we give a basic result. Theorem.5 ([8]). Let P _ G, N m,n and Q _ G, 3 N n,p. Then (i) P Q _ G, 3 N m,p ; (ii) (P Q) + = P + Q +. Proof. See [8]. In this article, the transpose of a vector x is denoted x. Set e = e (n) = (,,..., ) R n, n. Below we give an important result.. Theorem.6 ([8]). Let P _ G ( m ), N m,m, P _ G, 3 N m,m 3,..., P n _ G n, n N mn,m n, P n _ G n,({i}) i mn+ N m n,m n+. Then (i) P P...P n is a stable matrix; (ii) (P P...P n ) {i} = P + P +...Pn +, i m ((P P...P n ) {i} is the row i of P P...P n ); therefore, P P...P n = e π, where π = P + P +...Pn +. Proof. See [8]. Remark.7. Under the assumptions of Theorem.6, but taking P S m,m, P S m,m 3,..., P n S mn,m n, P n S mn,mn+, we have pp P...P n = π
5 5 G method in action: from exact sampling to approximate one 47 for any probability distribution p on m. Consequently, Theorem.6 could be used to prove that certain Markov chains have nite convergence time (see [7] and Sections and 3 for some examples). and Let P N m,n. Set α (P ) = _ α (P ) = min i,j m k= max i,j m k= n min (P ik, P jk ) n P ik P jk. If P S m,n, then α (P ) is called the Dobrushin ergodicity coecient of P ([5]; see, e.g., also [3, p. 56]). Theorem.8. (i) _ α (P ) = α (P ), P S m,n. (ii) µp νp µ ν _ α (P ), µ, ν, µ and ν are probability distributions on m, P S m,n. (iii) _ α (P Q) _ α (P ) _ α (Q), P S m,n, Q S n,p. Proof. (i) See, e.g., [3, p. 57] or [4, p. 44]. (ii) See, e.g., [5] or [4, p. 47]. (iii) See, e.g., [5], or [3, pp. 5859], or [4, p. 45]. Theorem.6 (see also Remark.7) could be used, e.g., in exact sampling theory based on nite Markov chains (see Section 3; see also Section ) while the next result could be used, e.g., in approximate sampling theory based on nite Markov chains (see Section 4). Theorem.9 ([8]). Let P N m,m, P N m,m 3,..., P n N mn,m n+. Let = ( m ), Par( m ),..., n Par( m n ), n+ = ({i}) i mn+. Consider the matrices L l = ((L l ) V W ) V l,w l+ ((L l ) V W is the entry (V, W ) of matrix L l ), where (L l ) V W = min i V (P l ) ij, l n, V l, W l+. j W Then α (P P...P n ) (L L...L n ) m K. K n+ (Since L L...L n is a m n+ matrix, it can be thought of as a row vector, but above we used and below we shall use, if necessary, the matrix notation for its entries instead of the vector one. Above the matrix notation (L L...L n ) m K was used instead of the vector one (L L...L n ) K because, in this article, the notation A U, where A N p,q and U p, means something dierent.)
6 48 Udrea Päun 6 Proof. See [8]. (Theorem.9 is part of Theorem.8 from [8].) Denition.0 (see, e.g., [0, p. 80]). Let P N m,n. We say that P is a row-allowable matrix if it has at least one positive entry in each row. Let P N m,n. Set _ (_ ) P = P ij N m,n, _ { if Pij > 0, P ij = 0 if P ij = 0, i m, j n. We call _ P the incidence matrix of P (see, e.g., [3, p. ]). In this article, some statements on the matrices hold, obviously, eventually by permutation of rows and columns. For simplication, further, we omit to specify this fact. Warning! In this article, if a Markov chain has the transition matrix P = P P...P s, where s and P, P,..., P s are stochastic matrices, then any -step transition of this chain is performed via P, P,..., P s, i.e., doing s transitions: one using P, one using P,..., one using P s. (See also Section.) Let S = r. Let π = (π i ) i S = (π, π,..., π r ) be a positive probability distribution on S. One way to sample approximately or, at best, exactly from S when r is by means of the hybrid Metropolis-Hastings chain from [8]. Below we dene this chain. Let E be a nonempty set. Set if and, where, Par(E). Let,,..., t+ Par(S) with = (S)... t+ = ({i}) i S, where t. Let Q, Q,..., Q t S r such that (C) _ Q, Q,..., Q t are symmetric matrices; (C) (Q l ) L K = 0, l t {}, K, L l, K L (this assumption implies that Q l is a block diagonal matrix and l -stable matrix on l, l t {}); (C3) (Q l ) U K is a row-allowable matrix, l t, K l, U l+, U K. Although Q l, l t, are not irreducible matrices if l, we dene the matrices P l, l t, as in the Metropolis-Hastings case ([6] and []; see, e.g., also [7, pp. 3336], [9, Chapter 6], [, pp. 5], [5, pp. 6366], and [, Chapter 0]), namely, ) P l = ((P l ) ij S r, 0 if j i and (Q l ) ij = 0, ( ) (Q (P l ) ij = l ) ij min, π j(q l ) ji π i (Q l ) if j i and (Q l ) ij ij > 0, (P l ) ik if j = i, k i
7 7 G method in action: from exact sampling to approximate one 49 l t. Set P = P P...P t. Theorem. ([8]). Concerning P above we have πp = π and P > 0. Proof. See [8]. By Theorem., P n e π as n. We call the Markov chain with transition matrix P the hybrid Metropolis-Hastings chain. In particular, we call this chain the hybrid Metropolis chain when Q, Q,..., Q t are symmetric matrices. We call the conditions (C)(C3) the basic conditions of hybrid Metropolis- Hastings chain. In particular, we call these conditions the basic conditions of hybrid Metropolis chain when Q, Q,..., Q t are symmetric matrices. The basic conditions (C)(C3) and other conditions, which we call the special conditions, determine special hybrid Metropolis-Hastings chains. E.g., in [8] was considered the next ( special hybrid Metropolis chain. Supposing that l = K (l),..., K(l), l t +, this chain satis-, K(l) u l ) es the conditions (C)(C3) and, moreover, the conditions: (c) K (l) = K (l) =... = K u (l) l, l t + with ul ; (c) r = r r...r t with r r...r l = l+, l t, and r t = K (t) (this condition is compatible with... t+ ); (c3) (c3.) Q l is a symmetric matrix such that (c3.) (Q l ) ii > 0, i S, and (Q l ) i j = (Q l ) i j, i, i, j, j S with i j, i j, and (Q l ) i j, (Q l ) i j > 0, l t ((c3.) says that all the positive entries of Q l, excepting the entries (Q l ) ii, i S, are equal, l t ); (c4) (Q l ) U K has in each row just one positive entry, l t, K l, U l+ with U K (this condition is compatible with (c3.) because (Q l ) W V is a square matrix, l t, V, W l+ ). The condition (c) is superuous because it follows from (C) and (c4). (c) is also superuous because it follows from (c) and... t+. It is interesting to note that the matrices P, P,..., P t satisfy conditions similar to (C)(C3) and, for this special chain, moreover, (c4) simply we replace Q l with P l, l t, in (C)(C3) and, if need be, in (c4). (c)(c) are common conditions for Q, Q,..., Q t and P, P,..., P t. In [8], for the chain satisfying the conditions (C)(C3) and (c)(c4), the positive entries of matrices Q l, l t, were, taking Theorem.9 into account, optimally chosen, i.e., these were chosen such that the lower bound of α (P P...P t ) from Theorem.9 be as large as possible (we need this condition to obtain a chain with a speed of convergence as large as possible). More
8 40 Udrea Päun 8 precisely, setting f l = π j min i,j S,(Q l ) ij >0 π i (do not forget the condition (Q l ) ij > 0!) and x l = (Q l ) ij, where i, j S are xed such that i j and (Q l ) ij > 0 (see (c3) again), it was found (taking Theorem.9 into account) x l = f l + r l. We call this chain the optimal hybrid Metropolis chain with respect to the conditions (C)(C3) and (c)(c4) and the inequality from Theorem.9 we call it the optimal hybrid Metropolis chain for short. In Section 3, we show that the Gibbs sampler on h n, h, n (more generally, on h h... h n, h, h,..., h n, n ) belongs to our collection of hybrid Metropolis-Hastings chains. Moreover, we shall show that the Gibbs sampler on h n satises all the conditions (c)(c4), excepting (c3). As to the estimate of p n π (p n and π are dened below), we have the next result. Theorem. (see, e.g., [8]). Let P S r be an aperiodic irreducible matrix. Consider a Markov chain with transition matrix P and limit probability distribution π. Let p n be the probability distribution of chain at time n, n 0. Then p n π _ α (P n ), n 0 (P 0 = I r ; by Theorem.8(iii), α _ (P n ) (_ α (P ) ) n, n 0, α _ ( _α ( (P n ) P k)) n k, n, k n ( x = max {b b Z, b x}, x R), etc.). Proof. See, e.g., [8] (it is used Theorem.8(ii) for the proof).. EXACT SAMPLING In this section, we consider a similarity relation. This has some interesting properties. Then we consider two methods of generation of the random variables exactly in the nite case only. The rst one, the alias method, is a special case of the second one. For each of these methods, we associate a Markov chain
9 9 G method in action: from exact sampling to approximate one 4 such that this chain can do what the method does. These associate chains are important for our unication. Finally, we associate a hybrid chain with a reference chain. Denition.. Let P, Q _ G, N m,n. We say that P is similar to Q if P + = Q +. Set P Q when P is similar to Q. Obviously, is an equivalence relation on _ G,. Theorem.. Let P, U _ G, N m,m and P, U _ G, 3 N m,m 3. Suppose that P U and P U. Then P P U U. Proof. By Theorem.5 we have P P, U U _ G, 3 N m,m 3. By Theorem.5 and Denition. we have Therefore, (P P ) + = P + P + = U + U + = (U U ) +. P P U U. Theorem.3. Let P, U _ G, N m,m, P, U _ G, 3 N m,m 3,..., P n, U n _ G n, n+ N mn,m n+. Suppose that P U, P U,..., P n U n. Then P P...P n U U...U n. If, moreover, = ( m ) and n+ = ({i}) i mn+, then P P...P n = U U...U n (therefore, when = ( m ) and n+ = ({i}) i mn+, a product of n representatives, the rst of an equivalence class included in G, _, the second of an equivalence class included in included in _ G, 3,..., the nth of an equivalence class _ G n, n+, does not depend on the choice of representatives). Proof. The rst part follows by Theorem. and induction. As to the second part, by Theorem.6, P P...P n and U U...U n are stable matrices and, further, Therefore, (P P...P n ) {i} = P + P +...P + n = = U + U +...U + n = (U U...U n ) {i}, i m. P P...P n = U U...U n.
10 4 Udrea Päun 0 The reader is assumed to be acquainted with the rst method below. Recall that for each of the two methods below we associate a Markov chain such that this chain can do what the method does.. The alias method (see, e.g., [4, pp. 073] and [5, pp. 57]). To illustrate our Markovian modeling here, we consider, for simplication, the example from [5, p. 5]. Following this example, we have a random variable X with the values,, 3, 4, 5 and probabilities π = 0.4, π = 0.7, π 3 = 0.07, π 4 = 0.4, π 5 = 0., where π i = P (X = i), i 5. The alias method leads, following the example from [5, p. 5] too, to the table (having rows and 5 columns) , 0.3, etc. are probabilities while,, 3, 4, 5 in bold print are values of X. In each column of the table, the sum of probabilities is equal to 0.0. We associate the alias method for generating X (when this method is applied to the generation of X) with the Markov chain (X n ) n 0 with state space S = {(3, ), (, ), (5, ), (, ), (4, 3), (, 3), (, 4), (, 4), (, 5)} if (x, x ) S, then x denotes a value of X while x denotes the column of table in which the value x is; for x = 5 (column 5), we only consider the state (, 5) because in column 5 the second probability is 0 and transition matrix P = P P, where P = (3, ) (, ) (5, ) (, ) (4, 3) (, 3) (, 4) (, 4) (, 5) (the columns are labeled similarly, i.e., (3, ), (, ), (5, ), (, ), (4, 3), (, 3), (, 4), (, 4), (, 5) from left to right) and
11 G method in action: from exact sampling to approximate one 43 P = (3, ) (, ) (5, ) (, ) (4, 3) (, 3) (, 4) (, 4) (, 5) P G,, P G, 3, where = (S), = ({(3, ), (, )}, {(5, ), (, )}, {(4, 3), (, 3)}, {(, 4), (, 4)}, {(, 5)}), 3 = ({(x, y)}) (x,y) S. By Theorem.6 it follows that P is a stable matrix and, more precisely, where P = e ρ, ρ = (0.07, 0.3, 0., 0.09, 0.4, 0.06, 0.9, 0.0, 0.0) (see the table again). Recall even if, here, P = e ρ that any -step transition of this chain is performed via P, P, i.e., doing two transitions: one using P and the other using P. Passing this Markov chain from an initial state, say, (3, ) (the state at time 0) to a state at time is done using, one after the other, the probability distributions (P ) {(3,)} (this is the rst row of matrix P ) suppose that using this probability distribution the chain arrives at state (i, j) and (P ) {(i,j)}. The alias method for generating X uses these probability distributions too, in the same order, the 0 s do not count, they can be removed e.g., (P ) {(3,)} = (0.0, 0, 0.0, 0, 0.0, 0, 0.0, 0, 0.0) leads, removing the 0 s, to (0.0, 0.0, 0.0, 0.0, 0.0),
12 44 Udrea Päun which is the probability distribution used by the alias method in its rst step (when, obviously, this method is applied to X from here). Therefore, this chain can do what the alias method does (we need to run this chain just one step (or two steps due to P and P ) until time inclusive). By Theorem.3 we can replace P with any matrix, U, similar to P obviously, it is more advantageous that each of the matrices U {(3,),(,)}, U {(5,),(,)}, U {(4,3),(,3)}, U {(,4),(,4)}, U {(,5)} have in each row just one positive entry. E.g., we can take U = and have (3, ) (, ) (5, ) (, ) (4, 3) (, 3) (, 4) (, 4) (, 5) P = P P = U P.. The reference method for our collection of hybrid chains (in particular, for the Gibbs sampler). We call it the reference method for short. Its name as well as the name of the chain determined of this, called the reference chain (see below for this chain), were inspired by the reference point from physics and other elds. The reference method is an iterative composition method we include the degenerate case when no composition is done (this case corresponds to the case t = of reference method). Below we present the reference method. Let X be a random variable with positive probability distribution π = (π, π,..., π r ) = (π i ) i S, where S = r. Let,,..., t+ Par(S) with = (S)... t+ = ({i}) i S, where t. Set Obviously, a (l) K,L = π i i L, l t, K l, L l+ with L K. π i i K a (l) K,L = P (X L X K ), l t, K l, L l+ with L K (a () S,L = i L π i = P (X L), L ),
13 3 G method in action: from exact sampling to approximate one 45 and ( ) a (l) K,L L K l+ is a probability distribution on K l+ = {K A A l+ } = = {B B l+, B K }, l t, K l. We generate random variable X as follows. (See also discrete mixtures in, e.g., [4, p. 6], the decomposition method in, e.g., [4, p. 66], and the composition method in, e.g., [4, p. 66].) ( ) Step. Generate L () a () S,L. Suppose that we obtained L () = L S L (L S = ). Set K = ( L. ) Step. Generate L () a () K,L. Suppose that we obtained L K 3 L () = L (L K 3 ). Set K = L.. Suppose that at Step t we obtained L (t ) = L t (L t K t t ). Set K t = L t. ( ) Step t. Generate L (t) a (t) K t,l. Suppose that we obtained L K t t+ L (t) = L t (L t K t t+ ). Since t+ = ({i}) i S, it follows that i S such that L t = {i}. Set X = i this value of X is generated according to its probability distribution π because by general multiplicative formula (see, e.g., [3, p. 6]) we have P (X = i) = P (X {i}) = P (X L t ) = P (X L L... L t ) = = a () S,L a () L,L...a (t) L t,l t = π i. The reference method is very fast if, practically speaking, we know the quantities a (l) K,L, l t, K l, L l+ with L K. Unfortunately, this does not happen in general if S is too large. But, fortunately, we can compute all or part of the quantities a (l) K,L when K and L are small this is an important thing (see, e.g., in Section 3, the hybrid chains with P ). To connect the reference method to our collection of hybrid chains, we associate the reference method (for generating a (nite) random variable (when this method is applied to the generation of a (nite) random variable)) with a (nite) Markov chain. To do this, rst, recall that the partitions for the reference method are,,..., t+ Par(S) with = (S)... t+ = ({i}) i S, where t. Let R, R,..., R t S r such that (A) R G,, R G, 3,..., R t G t, t+ ; (A) (R l ) L K = 0, l t {}, K, L l, K L (this assumption implies that R l is a block diagonal matrix and l -stable matrix on l, l t {});
14 46 Udrea Päun 4 (A3) (R l ) U K has in each row just one positive entry, l t, K l, U l+, U K (this assumption implies that (R l ) U K is a row-allowable matrix, l t, K l, U l+, U K). (A) and (A3) are similar to (C) and (c4) from Section, respectively. Therefore, the matrices P, P,..., P t of hybrid chain (obviously, we refer to our hybrid Metropolis-Hastings chain) and the matrices R, R,..., R t have some common things, respectively. This fact contributes to our unication (see Section 4). Suppose that each positive entry of (the matrix) (R l ) U K is equal to a(l) K,U are equal), l t, K (by (A) and (A3), all the positive entries of (R l ) U K l, U l+, U K. Set R = R R...R t. The following result is a main one both for the reference chain (this is dened below) and for (our) hybrid chains. Theorem.4. Under the above assumptions the following statements hold. (i) R l R l+...r t is a block diagonal matrix, l t {}, and l -stable matrix, l t. (ii) πr l R l+...r t = π, l t. (iii) R is a stable matrix. (iv) R = e π. Proof. (i) The rst part follows from (A). Now, we show the second part. By Theorem.5(i) and (A), R l R l+...r t G l, t+. Consequently, R l R l+...r t is a [ l ]-stable matrix, l t, because t+ = ({i}) i S (see Denition.3). Case. l =. Since = (S), it follows that R R...R t is a -stable (stable for short) matrix (see Denition.4). Case. l t {}. By (A) it follows that R l R l+...r t is a l -stable matrix. (ii) Let j S. Then!U (t) t such that {j} U (t) (! = there exists a unique). By (A)(A3) we have ( R + t )U (t) {j} > 0 and ( R + t ) V {j} = 0, V t, V U (t). Further, since t t,!u (t ) t such that U (t) U (t ). By (A) (A3) we have ( R + t > 0 and )U ( R + ) (t ) U (t) t = 0, V V U (t) t, V U (t ). Proceeding in this way, we nd a sequence U (), U (),..., U (t+) such that U (), U (),..., U (t+) t+, {j} = U (t+) U (t)... U () = S,
15 5 G method in action: from exact sampling to approximate one 47 and ( R + l > 0 and )U ( R + ) (l) U (l+) l = 0, l t, V V U (l+) l, V U (l). Let l t. Let i U (l) (j U (l) as well). By (i) (the fact that R l R l+...r t is a l -stable matrix), (A)(A), and Theorem.5 we have (R l R l+...r t ) ij = (R l R l+...r t ) + U (l) {j} = ( R + l R + ) l+...r + t U (l) {j} = = ( R + ( l )U (l) U R + (l+) l+... )U ( R + ) (l+) U (l+) t = a (l) U (l),u (l+) a (l+) U (l+),u (l+)...a (t) U (t),{j} = π j U (t) {j} = k U (l) π k (it was to be expected see (i) that this ratio will not depend on i U (l) ). Consequently, πr l R l+...r t = π. (iii) This follows by (i) or (iv). (iv) By proof of (ii) we have R ij = π j, i, j S. Therefore, R = e π. We associate the reference method for generating X with the Markov chain (this depends on X too) with state space S = r and transition matrix R = R R...R t. Recall even if, here, by Theorem.4(iv), R = e π that any -step transition of this chain is performed via R, R,..., R t, i.e., doing t transitions: one using R, one using R,..., one using R t. We call the above Markov chain the reference (Markov) chain. This is another example of chain with nite convergence time (see, e.g., also [7] for other examples of chains with nite convergence time). The best case is when we know the quantities a (l) K,L, l t, K l, L l+, L K; we can always know all or part of the quantities a (l) K,L when K and L are small a happy case! Passing the reference chain from an initial state, say, (the state at time 0) to a state at time is done using, one after the other, the probability distributions (R ) {} (this is the rst row of matrix R ), (R ) {i },..., (R t), where {it} i l = the state the chain arrives using (R l ) {il }, l t {}, setting i =. The reference method for generating X uses these probability distributions too, in the same order, the 0 s do not count, they can be removed e.g., if ( ) (R ) {} = a (), where S = 4, K () =, K () S,K (), 0, 0, a () S,K () = {3, 4}, then, removing the 0 s, we obtain ( ) a (), S,K (), a () S,K ()
16 48 Udrea Päun 6 which is the probability distribution ( used) by the reference method in its rst step, being, here, equal to K (), K(). Therefore, the reference chain can do what the reference method does (we need to run this chain just one step (or t steps due to R, R,..., R t ) until time inclusive). To illustrate the reference chain, we consider a random variable X with probability distribution π = (π, π,..., π 8 ). Taking the partitions = ( 8 ), = ({,, 3, 4}, {5, 6, 7, 8}), 3 = ({, }, {3, 4}, {5, 6}, {7, 8}), 4 = ({i}) i 8, a reference chain is the Markov chain with state space S = 8 and transition matrix R = R R R 3, where R = a () S,K () 0 a () S,K () a () S,K () 0 0 a () S,K () 0 0 a () S,K () a () S,K () a () S,K () 0 0 a () S,K () a () S,K () a () S,K () 0 0 a () S,K () 0 a () S,K () 0 a () S,K () a () 0 S,K () a () a () S,K () K () = {,, 3, 4}, K () = {5, 6, 7, 8}, a () = π S,K () + π + π 3 + π 4, a () = π S,K () 5 + π 6 + π 7 + π 8 (R G, ), R () = R = R() 0 a () K (),K(3) a () K (),K(3) a () K (),K(3) 0 a () K (),K(3) R (), 0 a () K (),K(3) 0 a () K (),K(3) 0 0 a () K (),K(3) 0 0 a () K (),K(3), S,K () 0,
17 7 G method in action: from exact sampling to approximate one 49 R () = a () K (),K(3) 3 a () K (),K(3) 3 0 a () K (),K(3) 3 a () K (),K(3) 3 0 a () K (),K(3) a () K (),K(3) 4 a () K (),K(3) a () K (),K(3) 4 K (3) = {, }, K (3) = {3, 4}, K (3) 3 = {5, 6}, K (3) 4 = {7, 8}, a () π + π =, a () π 3 + π 4 =, K (),K(3) π + π + π 3 + π 4 K (),K(3) π + π + π 3 + π 4 a () π 5 + π 6 =, a () π 7 + π 8 = K (),K(3) 3 π 5 + π 6 + π 7 + π 8 K (),K(3) 4 π 5 + π 6 + π 7 + π 8 (R G, 3 (moreover, it is a -stable matrix on, -stable matrix on 3, and block diagonal matrix)), and R 3 = R w (3) = a (3) = K w (3),K (4) w R (3) a (3) K (3) w,k (4) w a (3) K (3) w,k (4) w K (4) R (3) R (3) 3 a (3) K (3) w,k (4) w a (3) K (3) w,k (4) w i = {i}, i 8, π w, a (3) = π w + π w K w (3),K (4) w R (3) 4, 0 0, w 4,, π w π w + π w, w 4 (R 3 G 3, 4 (moreover, it is a 3 -stable matrix on 3, 3 -stable matrix (because 4 = ({i}) i 8, see Denition.4), and block diagonal matrix)). By Theorem.4(iv) or direct computation we have R = e π. Warning! In the above example, R, R, R 3 are representatives of certain equivalence classes: R l, where l 3, is a representative of the equivalence class determined by quantities a (l) K,L, K l, L l+ (the number of elements of this class can easily be determined; e.g., R belongs to an equivalence class with the cardinal equal to 4 6 because R K() and R K() are 8 4 matrices,...). Each triple (R, R, R 3 ) of representatives determines a reference chain, all these chains having the product R R R 3 equal to e π (see Theorems.3 and.4(iv)).
18 430 Udrea Päun 8 Now, it is easy to see that the chain associated with the alias method is a special case of the reference chain. Therefore (this was to be expected), the alias method is a special case of the reference method. Another interesting special case of the reference method (and of the reference chain) is the method of uniform generation of the random permutations of order n from, e.g., Example. in [7]. In Example. from [7], it is presented and analyzed a Markov chain which can do what the swapping method does (for the swapping method, see, e.g., [4, pp ]). When the probability distribution of interest, π, is uniform, we have a (l) K,L = L K, l t, K l, L l+, L K another happy case! We here supposed that there are t + partitions; in Example. from [7], t = n. Finally, for the next sections, we need to associate a hybrid Metropolis- Hastings chain with a reference chain. Below we state the terms of this association. Remark.5. This association makes sense (warning!) if, obviously, both chains are dened by means of the same state space S, the same probability distribution (of interest) π on S, and the same sequence of partitions,,..., t+ on S with = (S)... t+ = ({i}) i S, where t. We shall use expressions as the hybrid Metropolis-Hastings chain and its reference chain, the Gibbs sampler and its reference chain, the reference chain of a hybrid Metropolis-Hastings chain, etc., meaning that both chains from each expression are associated in this manner and the reference chain (with transition matrix R = R R...R t ) from each expression has the matrices R, R,..., R t specied or not, in the latter case, R, R,..., R t are only from the equivalence classes R, R,..., R t, respectively ( R l = the equivalence class of R l, l t ), so, the reader, in this latter case, has a complete freedom to choose the matrices R, R,..., R t as he/she wishes. The association from Remark.5 is good for our unication and, as a result of this, is good for comparisons and improvements (see the next sections). 3. EXACT SAMPLING USING HYBRID CHAINS In this section, rst, we show that the Gibbs sampler on h n (i.e., on {0,,..., h} n ), h, n, belongs to our collection of hybrid Metropolis-Hastings chains from [8]. Second, we give some interesting classes of probability distributions they are interesting because: ) supposing that the generation time is not limited, we can generate any random variable with probability distribution belonging to the union of these classes exactly (not approximately) by Gibbs sampler (sometimes by optimal hybrid Metropolis chain) or by a special Gibbs
19 9 G method in action: from exact sampling to approximate one 43 sampler with grouped coordinates in just one step; ) sometimes, the Gibbs sampler or a special Gibbs sampler with grouped coordinates is identical with its reference chain. An application on the random variables with geometric distribution is given. Third, results on the hybrid chains or reference chains are given. We begin the denition of Gibbs sampler we refer to the cyclic Gibbs sampler. Below we consider the (cyclic) Gibbs sampler on h n, h, n (more generally, we can consider the state space h h... h n, h, h,..., h n, n ), see [0]; see, e.g., also [, 3, 6], [7, pp. 364], [8], [9, Chapter 5], [, pp. and 554], [5, pp. 698], and [, Chapters 5 and 7]. Recall that the entry (i, j) of a matrix Z is denoted Z ij or, if confusion can arise, Z i j. We use the convention that an empty term vanishes. Let x = (x, x,..., x n ) S = h n, h, n. Set x [k l ] = (x, x,..., x l, k, x l+,..., x n ), k h, l n (consequently, x [k l ] S, k h, l n ). Let π be a positive probability distribution on S = h n (h, n ). Set the matrices P l, l n, where 0 if y x [k l ], k h, (P l ) xy = π x[k l ] π if y = x [k l ] for some k h, x[j l ] j h l n, x, y S. Set P = P P...P n. Consider the Markov chain with state space S = h n (h, n ) and transition matrix P above. This chain is called the cyclic Gibbs sampler the Gibbs sampler for short. For labeling the rows and columns of P, P,..., P n and other things, we consider the states of S = h n in lexicographic order, i.e., in the order (0, 0,..., 0), (0, 0,..., 0, ),..., (0, 0,..., 0, h), (0, 0,..., 0,, 0), (0, 0,..., 0,, ),..., (0, 0,..., 0,, h),..., (0, 0,..., 0, h, 0), (0, 0,..., 0, h, ),..., (0, 0,..., 0, h, h),..., (h, h,..., h, 0), (h, h,..., h, ),..., (h, h,..., h). Further, we show that the Gibbs sampler on h n, h, n (more generally, on h h... h n, h, h,..., h n, n ) belongs to our collection of hybrid Metropolis-Hastings chains from [8] and satises, moreover, the conditions (c)(c4), excepting (c3). More precisely, we show that the Gibbs sampler on h n satises all the conditions (C)(C3) (basic conditions) and (c)(c4) (special conditions), excepting (c3), and the equations from the denition of hybrid Metropolis-Hastings chain. To see this, following the second special case from Section 3 in [8] (there it was considered a more
20 43 Udrea Päun 0 general framework, namely, when the coordinates are grouped (blocked) into groups (blocks) of size v), set K (x,x,...,x l ) = {(y, y,..., y n ) (y, y,..., y n ) S and y i = x i, i l }, l n, x, x,..., x l h (obviously, K (x,x,...,x n) = {(x, x,..., x n )}), and = (S), l+ = ( ) K (x,x,...,x l ), l n. x,x,...,x l h Obviously, = (S)... n+ = ({x}) x S. Note also that the sets S, K (x,x,...,x l ), x, x,..., x l h, l n, determine, by inclusion relation, a tree which we call the tree of inclusions. For simplication, below we give the tree of inclusions for S = n (i.e., for S = {0, } n ). S K (0) K () K (0,0) K (0,) K (,0) K (,).. K (0,0,...,0) K (0,0,...,0,) K (,,...,,0) K (,,...,) Following, e.g., [7, pp. 4], we dene the matrices Q, Q,..., Q n as follows: Q l = P l, l n. It is easy to prove that the matrices Q l, l n, satisfy the basic conditions (C)(C3) from Section. Further, it is easy to prove that the matrices P l and Q l satisfy the equations 0 if y x and (Q l ) xy = 0, ( ) (Q (P l ) xy = l ) xy min, πy(q l) yx π x(q l ) if y x and (Q l ) xy xy > 0, (P l ) xz if y = x, z S,z x l n, x, y S. (Further, it follows that the conclusion of Theorem. holds, in particular, for P = P P...P n.) Therefore, the Gibbs sampler on h n belongs to our collection of hybrid Metropolis-Hastings chains from [8]. Now,
21 G method in action: from exact sampling to approximate one 433 it is easy to prove that the Gibbs sampler on h n satises, moreover, the special conditions (c)(c4), excepting (c3). Finally, we have the following result. Theorem 3.. The Gibbs sampler on h n belongs to our collection of hybrid Metropolis-Hastings chains. Moreover, this chain satises the conditions (c)(c4), excepting (c3). Proof. See above. Based on Theorem 3., it is easy now to show that the chain on h n, h, n (more generally, on h h... h n, h, h,..., h n, n ), dened below is, according to our denition, a hybrid Metropolis- Hastings chain which satises, moreover, all or part of the conditions (c)(c4). This chain on h n is a generalization of the (cyclic) Gibbs sampler on h n as follows: the matrices Q, Q,..., Q n of Gibbs sampler (see before Theorem 3.) are, more generally, replaced with the matrices Q, Q,..., Q n (we used the same notation for these) such that _ Q, _ Q,..., _ Q n of the former matrices are identical with _ Q, _ Q,..., _ Q n of the latter matrices, respectively; P = P P...P n is the transition matrix of this chain, where, using Metropolis-Hastings rule, P, P,..., P n are dened by means of the more general matrices Q, Q,..., Q n, respectively. Since we now know the structure of matrices P, P,..., P n corresponding to the update of coordinates,,..., n, respectively, we could study other types of Gibbs samplers on h n, h, n (the random Gibbs sampler, etc., see, e.g., [], [8], [5, pp. 773], and [9]), and, more generally, other types of chains on h n, h, n (a generalization of the random Gibbs sampler, etc.), derived from the generalization from the above paragraph of Gibbs sampler. Recall that R + = {x x R and x > 0}. Recall that the states of S = h n are considered in lexicographic order. Theorem 3.. Let S = h n, h, n. Let w = (h + ) t, 0 t n. Consider on S the probability distribution π = (c 0, c 0 a,..., c 0 a w, c, c a,..., c a w,..., c h, c h a,..., c h a w, c 0, c 0 a,..., c 0 a w, c, c a,..., c a w,..., c h, c h a,..., c h a w,..., c 0, c 0 a,..., c 0 a w, c, c a,..., c a w,..., c h, c h a,..., c h a w ) (the sequence c 0, c 0 a,..., c 0 a w, c, c a,..., c a w,..., c h, c h a,..., c h a w appears (h + ) n t times if 0 t < n and c 0, c 0 a,..., c 0 a w only appears if t = n), where c 0, c,..., c h, a R +. Then, for the Gibbs sampler and, when h =, for the _ optimal hybrid Metropolis chain with the matrices Q, Q,..., Q n such that Q, Q _,..., Q n are identical with Q, Q _,..., Q _ n of the Gibbs sampler on
22 434 Udrea Päun S = n, respectively, we have, using the same notation, P = P P...P n, for the transition matrices of these two chains, P = e π (therefore, the stationarity of these chains is attained at time ). Proof. Since (see the proof of Theorem 3.) = (S), = ( K (0), K (),..., K (h) ), 3 = ( K (0,0), K (0,),..., K (0,h), K (,0), K (,),..., K (,h),..., K (h,0), K (h,),..., K (h,h) ),. n+ = ({x}) x S, we have S = (h + ) n ( S is the cardinal of S), K (0) = K () =... = K (h) = (h + ) n, K (0,0) = K (0,) =... = K (0,h) = K (,0) = K (,) =... = K (,h) = = K (h,0) = K (h,) =... = K (h,h) = (h + ) n,. {x} = (h + ) 0 =, x S. Let l n. Let K l and L l+, L K. Then v, v,..., v l h such that and K = { S if l =, K (v,v,...,v l ) if l, L = K (v,v,...,v l ). Let x = (x, x,..., x n ) K. It follows that x = v, x = v,..., x l = v l (these equations vanish when l = ) and, obviously, x [v l l ] = (x, x,..., x l, v l, x l+,..., x n ) = (v, v,..., v l, v l, x l+,..., x n ) L. Note also that K = (h + ) n l+, L = (h + ) n l, and (P l ) xx[vl l ] > 0. (The reader, if he/she wishes, can use the notation (P l ) x x[vl l ] instead of (P l ) xx[vl l ].)
23 3 G method in action: from exact sampling to approximate one 435 First, we consider the Gibbs sampler. To compute the probabilities (P l ) xx[vl l ], we consider three cases: n l < t; n l = t; n l > t. The case n l < t is a bit more dicult. In this case, the probabilities π x corresponding to the elements x K are, keeping the order, c i a v, c i a v+,..., c i a v+(h+)n l+ for some i h and v w (h + ) n l+ + and those corresponding to the elements x L are, keeping the order, We have c i a v+v l(h+) n l, c i a v+v l(h+) n l +,..., c i a v+(v l+)(h+) n l. c i a v+v l(h+) n l +z h c i a v+s(h+)n l +z s=0 = av l(h+) n l, z h a s(h+)n l s=0 It follows that the rst ratio does not depend on z, z Moreover, it does not depend on c i and v. The others two cases are obvious. We now have (P l ) xx[vl l ] = a v l (h+)b h s=0 c vl h i=0 (h + ) n l. (h + ) n l. a s(h+)b if n l = b for some b t, c i if n l = t, h+ if n l > t. Consequently, P G,, P G, 3,..., P n G n, n+. By Theorems.6,., and 3., P = e π. Second, we consider the optimal hybrid Metropolis chain when h =. In this case, w = t, 0 t n, and π = (c 0, c 0 a,..., c 0 a w, c, c a,..., c a w, c 0, c 0 a,..., c 0 a w, c, c a,..., c a w,..., c 0, c 0 a,..., c 0 a w, c, c a,..., c a w ). As to the positions of positive entries of Q, Q,..., Q n, we have, by hypothesis, Q, Q,..., Q n such that _ Q, _ Q,..., _ Q n are identical with _ Q, _ Q,..., _ Q n of
24 436 Udrea Päun 4 the Gibbs sampler on S = n (see Section and Theorem 3.; see also the second special case from Section 3 in [8]), respectively. It follows that ( min a b, a b) if n l = b for some b t, ( ) f l = min c0 c, c c 0 if n l = t, if n l > t, = x l = f l + r l = f l + = f l + = ( min a b,a b) + min( c0 c, c c0 )+ if n l = b for some b t, if n l = t, if n l > t, and (cases for c 0 and c : c 0 c, c 0 > c (c 0, c R + ); cases for a : a, a > (a R + )) v l +v l a b if n l = b for some b t, +a b (P l ) xx[vl l ] = c vl c 0 +c if n l = t, if n l > t. Note that v l + v l a b = a v l b because v l. It follows that these transition probabilities are identical with those for the Gibbs sampler when h =. This is an interesting thing. See also Theorem 3.6 (the optimal hybrid Metropolis chain is not considered there because of this thing). Proceeding as in the Gibbs sampler case, it follows that P = e π. We call the probability distribution from Theorem 3. the wavy probability distribution (of rst type). Remark 3.3. As to the class of wavy distributions from Theorem 3., the Gibbs sampler is better than the optimal hybrid Metropolis chain when the latter chain has the matrices Q, Q,..., Q n such that _ Q, _ Q,..., _ Q n are identical with _ Q, _ Q,..., _ Q n of the Gibbs sampler on S = h n, respectively. For this see Theorem 3. and the following two examples. () Consider (the probability distribution) π = ( c 0, c 0 a, c 0 a, c, c a, c a, c, c a, c a )
25 5 G method in action: from exact sampling to approximate one 437 on S =, where c 0, c, c, a R +, c i c j, i, j. π is a wavy probability distribution. Suppose, for simplication, that c 0 < c < c. By Theorem 3., for the Gibbs sampler, we have P = e π (P = P P ). It is easy to prove, for the optimal hybrid Metropolis chain, that P e π (P = P P ; we used the same notation for matrices in both cases). () Consider π = (c, ca,..., ca w, c, ca,..., ca w,..., c, ca,..., ca w ) on S = h n, c, a R +, h, n. π is also a wavy probability distribution (the case when c 0 = c =... = c h := c). By Theorem 3., for the Gibbs sampler, we have P = e π (P = P P...P n ). It is easy to prove, for the optimal hybrid Metropolis chain when, e.g., π = ( c, ca,..., ca 8) on S =, c, a R +, a (for a =, π = the uniform probability distribution), that P e π (we also used the same notation for matrices in both cases with the only dierence that P = P P here). By Theorem 3. and Remark 3.3 it is possible that, on h n, the Gibbs sampler or a special generalization of it be the fastest chain in our collection of hybrid Metropolis-Hastings chains. The word fastest refers to Markov chains strictly, not to computers. The running time of our hybrid chains on a computer is another matter (the computational cost per step is the main problem; on a computer, a step of a Markov chain can be performed or not). Example 3.4. Consider the probability distribution π on S = 00 (i.e., on S = {0, } 00 ), where π (0,0,...,0) = d, π x = d 00, x S, x (0, 0,..., 0), where d (0, ) (e.g., d =, or d = 3 4, or d = 9 0 ). Since the sampling from S using the Gibbs sampler or optimal hybrid Metropolis chain can be intractable
26 438 Udrea Päun 6 (on any computer) for some d, one way is breaking of d into many pieces. For this we consider the probability distribution ρ on S = 0, where ρ (0,x) = d 00, ρ (,x) = ( d) , x S ((0, x) and (, x) are vectors from S ). Since ρ (0,x) = π x, x S, x (0, 0,..., 0), it follows that the sampling from S can be performed via the sampling from S. Indeed, letting X be a random variable with the probability distribution π, if, using ρ (on S ), we select a value equal to (0, u) for some u S, u (0, 0,..., 0) S, then we set X = u this value of X is selected according to its probability distribution π on S while, if, using ρ too, we select a value equal to (0, 0,..., 0) S or (, v) for some v S, then we set X = (0, 0,..., 0) (obviously, (0, 0,..., 0) S) this value of X is also selected according to its probability distribution. By Theorem 3. the Gibbs sampler and optimal hybrid Metropolis chain sample exactly (not approximately) from S (equipped with ρ); this implies that the sampling from S (equipped with π) is also exactly. The wavy probability distribution(s) from Theorem 3. has (have) something in common with the geometric distribution. This fact suggests the next application. Application 3.5. To generate, exactly, a random variable with geometric distribution ( p, pq, pq,... ), p, q (0, ), q = p, we can proceed as follows (see, e.g., also [4, p. 500]). We split the geometric distribution into two parts, a tail carrying small probability and a main body of size n, where n, n 0, is suitably chosen. The main body contains the rst n values of geometric distribution and determines the probability distribution where π = ( Zp, Zpq, Zpq,...Zpq n ), Z = q n. We choose the main body with the probability q n (= p + pq pq n ) and the tail with the probability q n. (See also discrete mixtures in, e.g., [4, p. 6], the decomposition method in, e.g., [4, p. 66], and the composition method in, e.g., [4, p. 66].) If the output of choice is the main body, then we can sample exactly (not approximately) from {,,..., n } (equipped with the probability distribution π), using the Gibbs sampler or optimal hybrid Metropolis chain when n, see Theorem 3. (the stationarity is attained at time for each of these chains). Obviously, to use the former or latter chain, we
27 7 G method in action: from exact sampling to approximate one 439 need another distribution, µ we replace the probability distribution π = (π i ) on {,,..., n }, π = Zp, π = Zpq,..., π n = Zpq n, with µ = (µ i ) on n, µ (0,0,...,0) = π, µ (0,0,...,0,) = π, µ (0,0,...,0,,0) = π 3, µ (0,0,...,0,,) = π 4,..., µ (,,...,,0) = π n, µ (,,...,) = π n. Otherwise, i.e., if the output of choice is the tail, we can proceed as follows. Supposing that X is a random variable with the geometric distribution above, i.e., ( p, pq, pq,... ), then, due to the lack-of-memory property of X, X n (X > n here) is a random variable with the same geometric distribution as X, i.e., ( p, pq, pq,... ). Therefore, further, we can work with X n and its probability distribution ( p, pq, pq,... ) (we again split this distribution into two parts, a main body and a tail,...), etc. The case when all the main bodies are of size ( 0 = ) is well-known, see, e.g., [4, p. 498]; we here gave a generalization of this case by Gibbs sampler or optimal hybrid Metropolis chain. The next result says that, sometimes, the Gibbs sampler (in some cases even the optimal hybrid Metropolis chain, see Theorem 3. and its proof and the next result) is identical with its reference chain. Theorem 3.6. Consider on S = h n, h, n, the wavy probability distribution π from Theorem 3.. Consider on S (equipped with π) the Gibbs sampler with transition matrix P = P P...P n and its reference chain with transition matrix R = R R...R n (see Remark.5). Then (P l ) xy = a (l) K,L, l n, K l, L l+ with L K, x K, y L with (P l ) xy > 0 ( l, l n +, are the partitions determined by the Gibbs sampler, see the proof of Theorem 3.), and _ P = R. If, moreover, R l = P _ l, l n, then P l = R l, l n. (Therefore, under all the above conditions, the Gibbs sampler is identical with its reference chain, leaving the initial probability distribution aside.) Proof. First, we show that (P l ) xy = a (l) K,L, l n, K l, L l+ with L K, x K, y L with (P l ) xy > 0. For the Gibbs sampler on S = h n, in the proof of Theorem 3., it was shown that a v l (h+)b if n l = b for some b t, h a s(h+)b s=0 (P l ) xx[vl l ] = c vl if n l = t, h c i i=0 h+ if n l > t.
28 440 Udrea Päun 8 Recall that a (l) K,L = π x x L x K π x, l n, K l, L l+, L K, see the reference method in Section. Recall that K = (h + ) n l+ and L = (h + ) n l, see the proof of Theorem 3.. Case. n l = b for some b t. By proof of Theorem 3. we have Since and a (l) K,L = = we have π x x L x K (h+) b w =0 π x = (h+) b w =0 (h+) b+ w =0 c i a v+v l(h+) b +w c i a v+w = (h+) b w =0 (h+) b+ w =0 a v l(h+) b +w a w a v l(h+) b +w = a v l(h+) b ( + a a (h+)b ) (h+) b+ w =0 a w = ( + a a (h+)b ) + ( ) + a (h+)b + a (h+)b a (h+)b +... ( )... + a h(h+)b + a h(h+)b a (h+)b+ = ( ) ( ) + a a (h+)b + a (h+)b + a a (h+)b +... ( )... + a h(h+)b + a a (h+)b = ( ) ( = + a a (h+)b + a (h+)b a h(h+)b), a (l) K,L = av b l(h+). h a s(h+)b s=0 Case. n l = t. By denition of π (see Theorem 3.) and proof of Theorem 3. we have π x = c vl + c vl a c vl a w = c vl ( + a a w ) x L.
29 9 G method in action: from exact sampling to approximate one 44 and π x = (c 0 + c 0 a c 0 a w ) + (c + c a c a w ) +... x K Consequently,... + (c h + c h a c h a w ) = ( + a a w ) a (l) K,L = c v l. h c i i=0 h c i. Case 3. n l > t. By denition of π and proof of Theorem 3., setting it is easy to see that Consequently, σ L = x L π x, π x = (h + ) σ L. x K a (l) K,L = h +. From Cases 3, we have (P l ) xx[vl l ] = a (l) K,L. Therefore, (P l) xy = a (l) K,L (y = x [v l l ] for some v l h ), l n, K l, L l+ with L K, x K, y L with (P l ) xy > 0. By Theorem.6 and above result we have P = e π. By Theorem.4(iv), R = e π. Therefore, P = R. The other part of conclusion is obvious. In [8], we modied our hybrid (Metropolis-Hastings) chains such that the modied hybrid chains have better upper bounds for p n π (see Theorem.; see also Theorems.8 and.9). Below we present this modication. If P = P P...P t (see Section ) is the transition matrix of a hybrid Metropolis-Hastings chain, we replace the product P s+ P s+ (...P t ( s < t) by the block diagonal s+ -stable matrix (recall that l = K (l),..., K(l), l t +, see Section ) P = P (s) = A (s+) A (s+)... i=0 A (s+) u s+, K(l), u l )
Linear Algebra (part 1) : Vector Spaces (by Evan Dummit, 2017, v. 1.07) 1.1 The Formal Denition of a Vector Space
Linear Algebra (part 1) : Vector Spaces (by Evan Dummit, 2017, v. 1.07) Contents 1 Vector Spaces 1 1.1 The Formal Denition of a Vector Space.................................. 1 1.2 Subspaces...................................................
More informationIntrinsic products and factorizations of matrices
Available online at www.sciencedirect.com Linear Algebra and its Applications 428 (2008) 5 3 www.elsevier.com/locate/laa Intrinsic products and factorizations of matrices Miroslav Fiedler Academy of Sciences
More informationLinear Algebra (part 1) : Matrices and Systems of Linear Equations (by Evan Dummit, 2016, v. 2.02)
Linear Algebra (part ) : Matrices and Systems of Linear Equations (by Evan Dummit, 206, v 202) Contents 2 Matrices and Systems of Linear Equations 2 Systems of Linear Equations 2 Elimination, Matrix Formulation
More informationINTRODUCTION TO MARKOV CHAINS AND MARKOV CHAIN MIXING
INTRODUCTION TO MARKOV CHAINS AND MARKOV CHAIN MIXING ERIC SHANG Abstract. This paper provides an introduction to Markov chains and their basic classifications and interesting properties. After establishing
More information1 Matrices and Systems of Linear Equations
Linear Algebra (part ) : Matrices and Systems of Linear Equations (by Evan Dummit, 207, v 260) Contents Matrices and Systems of Linear Equations Systems of Linear Equations Elimination, Matrix Formulation
More informationMarkov Chains CK eqns Classes Hitting times Rec./trans. Strong Markov Stat. distr. Reversibility * Markov Chains
Markov Chains A random process X is a family {X t : t T } of random variables indexed by some set T. When T = {0, 1, 2,... } one speaks about a discrete-time process, for T = R or T = [0, ) one has a continuous-time
More information1 Introduction This work follows a paper by P. Shields [1] concerned with a problem of a relation between the entropy rate of a nite-valued stationary
Prexes and the Entropy Rate for Long-Range Sources Ioannis Kontoyiannis Information Systems Laboratory, Electrical Engineering, Stanford University. Yurii M. Suhov Statistical Laboratory, Pure Math. &
More information8. Statistical Equilibrium and Classification of States: Discrete Time Markov Chains
8. Statistical Equilibrium and Classification of States: Discrete Time Markov Chains 8.1 Review 8.2 Statistical Equilibrium 8.3 Two-State Markov Chain 8.4 Existence of P ( ) 8.5 Classification of States
More informationAnother algorithm for nonnegative matrices
Linear Algebra and its Applications 365 (2003) 3 12 www.elsevier.com/locate/laa Another algorithm for nonnegative matrices Manfred J. Bauch University of Bayreuth, Institute of Mathematics, D-95440 Bayreuth,
More informationP i [B k ] = lim. n=1 p(n) ii <. n=1. V i :=
2.7. Recurrence and transience Consider a Markov chain {X n : n N 0 } on state space E with transition matrix P. Definition 2.7.1. A state i E is called recurrent if P i [X n = i for infinitely many n]
More information290 J.M. Carnicer, J.M. Pe~na basis (u 1 ; : : : ; u n ) consisting of minimally supported elements, yet also has a basis (v 1 ; : : : ; v n ) which f
Numer. Math. 67: 289{301 (1994) Numerische Mathematik c Springer-Verlag 1994 Electronic Edition Least supported bases and local linear independence J.M. Carnicer, J.M. Pe~na? Departamento de Matematica
More informationDeterminants of Partition Matrices
journal of number theory 56, 283297 (1996) article no. 0018 Determinants of Partition Matrices Georg Martin Reinhart Wellesley College Communicated by A. Hildebrand Received February 14, 1994; revised
More informationLifting to non-integral idempotents
Journal of Pure and Applied Algebra 162 (2001) 359 366 www.elsevier.com/locate/jpaa Lifting to non-integral idempotents Georey R. Robinson School of Mathematics and Statistics, University of Birmingham,
More informationA TOUR OF LINEAR ALGEBRA FOR JDEP 384H
A TOUR OF LINEAR ALGEBRA FOR JDEP 384H Contents Solving Systems 1 Matrix Arithmetic 3 The Basic Rules of Matrix Arithmetic 4 Norms and Dot Products 5 Norms 5 Dot Products 6 Linear Programming 7 Eigenvectors
More informationLecture 2 INF-MAT : A boundary value problem and an eigenvalue problem; Block Multiplication; Tridiagonal Systems
Lecture 2 INF-MAT 4350 2008: A boundary value problem and an eigenvalue problem; Block Multiplication; Tridiagonal Systems Tom Lyche Centre of Mathematics for Applications, Department of Informatics, University
More informationThe Inclusion Exclusion Principle and Its More General Version
The Inclusion Exclusion Principle and Its More General Version Stewart Weiss June 28, 2009 1 Introduction The Inclusion-Exclusion Principle is typically seen in the context of combinatorics or probability
More informationLecture 11: Introduction to Markov Chains. Copyright G. Caire (Sample Lectures) 321
Lecture 11: Introduction to Markov Chains Copyright G. Caire (Sample Lectures) 321 Discrete-time random processes A sequence of RVs indexed by a variable n 2 {0, 1, 2,...} forms a discretetime random process
More informationCalculus and linear algebra for biomedical engineering Week 3: Matrices, linear systems of equations, and the Gauss algorithm
Calculus and linear algebra for biomedical engineering Week 3: Matrices, linear systems of equations, and the Gauss algorithm Hartmut Führ fuehr@matha.rwth-aachen.de Lehrstuhl A für Mathematik, RWTH Aachen
More informationMath 42, Discrete Mathematics
c Fall 2018 last updated 12/05/2018 at 15:47:21 For use by students in this class only; all rights reserved. Note: some prose & some tables are taken directly from Kenneth R. Rosen, and Its Applications,
More informationOperations On Networks Of Discrete And Generalized Conductors
Operations On Networks Of Discrete And Generalized Conductors Kevin Rosema e-mail: bigyouth@hardy.u.washington.edu 8/18/92 1 Introduction The most basic unit of transaction will be the discrete conductor.
More informationLecture 3: Markov chains.
1 BIOINFORMATIK II PROBABILITY & STATISTICS Summer semester 2008 The University of Zürich and ETH Zürich Lecture 3: Markov chains. Prof. Andrew Barbour Dr. Nicolas Pétrélis Adapted from a course by Dr.
More informationMarkov Chain Monte Carlo
Chapter 5 Markov Chain Monte Carlo MCMC is a kind of improvement of the Monte Carlo method By sampling from a Markov chain whose stationary distribution is the desired sampling distributuion, it is possible
More informationMath 471 (Numerical methods) Chapter 3 (second half). System of equations
Math 47 (Numerical methods) Chapter 3 (second half). System of equations Overlap 3.5 3.8 of Bradie 3.5 LU factorization w/o pivoting. Motivation: ( ) A I Gaussian Elimination (U L ) where U is upper triangular
More informationDiscrete Math, Spring Solutions to Problems V
Discrete Math, Spring 202 - Solutions to Problems V Suppose we have statements P, P 2, P 3,, one for each natural number In other words, we have the collection or set of statements {P n n N} a Suppose
More informationNotes on the Matrix-Tree theorem and Cayley s tree enumerator
Notes on the Matrix-Tree theorem and Cayley s tree enumerator 1 Cayley s tree enumerator Recall that the degree of a vertex in a tree (or in any graph) is the number of edges emanating from it We will
More informationLinear Algebra, 4th day, Thursday 7/1/04 REU Info:
Linear Algebra, 4th day, Thursday 7/1/04 REU 004. Info http//people.cs.uchicago.edu/laci/reu04. Instructor Laszlo Babai Scribe Nick Gurski 1 Linear maps We shall study the notion of maps between vector
More informationNotes on the matrix exponential
Notes on the matrix exponential Erik Wahlén erik.wahlen@math.lu.se February 14, 212 1 Introduction The purpose of these notes is to describe how one can compute the matrix exponential e A when A is not
More informationMARKOV CHAIN MONTE CARLO
MARKOV CHAIN MONTE CARLO RYAN WANG Abstract. This paper gives a brief introduction to Markov Chain Monte Carlo methods, which offer a general framework for calculating difficult integrals. We start with
More informationDefinition: A binary relation R from a set A to a set B is a subset R A B. Example:
Chapter 9 1 Binary Relations Definition: A binary relation R from a set A to a set B is a subset R A B. Example: Let A = {0,1,2} and B = {a,b} {(0, a), (0, b), (1,a), (2, b)} is a relation from A to B.
More informationBalance properties of multi-dimensional words
Theoretical Computer Science 273 (2002) 197 224 www.elsevier.com/locate/tcs Balance properties of multi-dimensional words Valerie Berthe a;, Robert Tijdeman b a Institut de Mathematiques de Luminy, CNRS-UPR
More informationINTRODUCTION TO MARKOV CHAIN MONTE CARLO
INTRODUCTION TO MARKOV CHAIN MONTE CARLO 1. Introduction: MCMC In its simplest incarnation, the Monte Carlo method is nothing more than a computerbased exploitation of the Law of Large Numbers to estimate
More informationLecture 8: Determinants I
8-1 MATH 1B03/1ZC3 Winter 2019 Lecture 8: Determinants I Instructor: Dr Rushworth January 29th Determinants via cofactor expansion (from Chapter 2.1 of Anton-Rorres) Matrices encode information. Often
More informationDetailed Proof of The PerronFrobenius Theorem
Detailed Proof of The PerronFrobenius Theorem Arseny M Shur Ural Federal University October 30, 2016 1 Introduction This famous theorem has numerous applications, but to apply it you should understand
More information1 GSW Sets of Systems
1 Often, we have to solve a whole series of sets of simultaneous equations of the form y Ax, all of which have the same matrix A, but each of which has a different known vector y, and a different unknown
More informationSummary: A Random Walks View of Spectral Segmentation, by Marina Meila (University of Washington) and Jianbo Shi (Carnegie Mellon University)
Summary: A Random Walks View of Spectral Segmentation, by Marina Meila (University of Washington) and Jianbo Shi (Carnegie Mellon University) The authors explain how the NCut algorithm for graph bisection
More informationPolynomial functions over nite commutative rings
Polynomial functions over nite commutative rings Balázs Bulyovszky a, Gábor Horváth a, a Institute of Mathematics, University of Debrecen, Pf. 400, Debrecen, 4002, Hungary Abstract We prove a necessary
More informationCoins with arbitrary weights. Abstract. Given a set of m coins out of a collection of coins of k unknown distinct weights, we wish to
Coins with arbitrary weights Noga Alon Dmitry N. Kozlov y Abstract Given a set of m coins out of a collection of coins of k unknown distinct weights, we wish to decide if all the m given coins have the
More informationLinear algebra. S. Richard
Linear algebra S. Richard Fall Semester 2014 and Spring Semester 2015 2 Contents Introduction 5 0.1 Motivation.................................. 5 1 Geometric setting 7 1.1 The Euclidean space R n..........................
More informationThe matrix approach for abstract argumentation frameworks
The matrix approach for abstract argumentation frameworks Claudette CAYROL, Yuming XU IRIT Report RR- -2015-01- -FR February 2015 Abstract The matrices and the operation of dual interchange are introduced
More informationComputer Vision Group Prof. Daniel Cremers. 11. Sampling Methods: Markov Chain Monte Carlo
Group Prof. Daniel Cremers 11. Sampling Methods: Markov Chain Monte Carlo Markov Chain Monte Carlo In high-dimensional spaces, rejection sampling and importance sampling are very inefficient An alternative
More informationMath 443/543 Graph Theory Notes 5: Graphs as matrices, spectral graph theory, and PageRank
Math 443/543 Graph Theory Notes 5: Graphs as matrices, spectral graph theory, and PageRank David Glickenstein November 3, 4 Representing graphs as matrices It will sometimes be useful to represent graphs
More informationSection Summary. Relations and Functions Properties of Relations. Combining Relations
Chapter 9 Chapter Summary Relations and Their Properties n-ary Relations and Their Applications (not currently included in overheads) Representing Relations Closures of Relations (not currently included
More informationReduction of two-loop Feynman integrals. Rob Verheyen
Reduction of two-loop Feynman integrals Rob Verheyen July 3, 2012 Contents 1 The Fundamentals at One Loop 2 1.1 Introduction.............................. 2 1.2 Reducing the One-loop Case.....................
More informationFoundations of Matrix Analysis
1 Foundations of Matrix Analysis In this chapter we recall the basic elements of linear algebra which will be employed in the remainder of the text For most of the proofs as well as for the details, the
More informationThe Degree of the Splitting Field of a Random Polynomial over a Finite Field
The Degree of the Splitting Field of a Random Polynomial over a Finite Field John D. Dixon and Daniel Panario School of Mathematics and Statistics Carleton University, Ottawa, Canada fjdixon,danielg@math.carleton.ca
More informationOn asymptotic behavior of a finite Markov chain
1 On asymptotic behavior of a finite Markov chain Alina Nicolae Department of Mathematical Analysis Probability. University Transilvania of Braşov. Romania. Keywords: convergence, weak ergodicity, strong
More informationInstitute for Advanced Computer Studies. Department of Computer Science. On Markov Chains with Sluggish Transients. G. W. Stewart y.
University of Maryland Institute for Advanced Computer Studies Department of Computer Science College Park TR{94{77 TR{3306 On Markov Chains with Sluggish Transients G. W. Stewart y June, 994 ABSTRACT
More informationKey words. Feedback shift registers, Markov chains, stochastic matrices, rapid mixing
MIXING PROPERTIES OF TRIANGULAR FEEDBACK SHIFT REGISTERS BERND SCHOMBURG Abstract. The purpose of this note is to show that Markov chains induced by non-singular triangular feedback shift registers and
More informationHomework 1 Solutions ECEn 670, Fall 2013
Homework Solutions ECEn 670, Fall 03 A.. Use the rst seven relations to prove relations (A.0, (A.3, and (A.6. Prove (F G c F c G c (A.0. (F G c ((F c G c c c by A.6. (F G c F c G c by A.4 Prove F (F G
More informationThe super line graph L 2
Discrete Mathematics 206 (1999) 51 61 www.elsevier.com/locate/disc The super line graph L 2 Jay S. Bagga a;, Lowell W. Beineke b, Badri N. Varma c a Department of Computer Science, College of Science and
More information4.1 Eigenvalues, Eigenvectors, and The Characteristic Polynomial
Linear Algebra (part 4): Eigenvalues, Diagonalization, and the Jordan Form (by Evan Dummit, 27, v ) Contents 4 Eigenvalues, Diagonalization, and the Jordan Canonical Form 4 Eigenvalues, Eigenvectors, and
More informationRESEARCH ARTICLE. An extension of the polytope of doubly stochastic matrices
Linear and Multilinear Algebra Vol. 00, No. 00, Month 200x, 1 15 RESEARCH ARTICLE An extension of the polytope of doubly stochastic matrices Richard A. Brualdi a and Geir Dahl b a Department of Mathematics,
More informationPade approximants and noise: rational functions
Journal of Computational and Applied Mathematics 105 (1999) 285 297 Pade approximants and noise: rational functions Jacek Gilewicz a; a; b;1, Maciej Pindor a Centre de Physique Theorique, Unite Propre
More informationAn O(n 2 ) algorithm for maximum cycle mean of Monge matrices in max-algebra
Discrete Applied Mathematics 127 (2003) 651 656 Short Note www.elsevier.com/locate/dam An O(n 2 ) algorithm for maximum cycle mean of Monge matrices in max-algebra Martin Gavalec a;,jan Plavka b a Department
More informationMarkov Random Fields
Markov Random Fields 1. Markov property The Markov property of a stochastic sequence {X n } n 0 implies that for all n 1, X n is independent of (X k : k / {n 1, n, n + 1}), given (X n 1, X n+1 ). Another
More information5 Eigenvalues and Diagonalization
Linear Algebra (part 5): Eigenvalues and Diagonalization (by Evan Dummit, 27, v 5) Contents 5 Eigenvalues and Diagonalization 5 Eigenvalues, Eigenvectors, and The Characteristic Polynomial 5 Eigenvalues
More informationNordhaus-Gaddum Theorems for k-decompositions
Nordhaus-Gaddum Theorems for k-decompositions Western Michigan University October 12, 2011 A Motivating Problem Consider the following problem. An international round-robin sports tournament is held between
More informationDot Products, Transposes, and Orthogonal Projections
Dot Products, Transposes, and Orthogonal Projections David Jekel November 13, 2015 Properties of Dot Products Recall that the dot product or standard inner product on R n is given by x y = x 1 y 1 + +
More information1. Affine Grassmannian for G a. Gr Ga = lim A n. Intuition. First some intuition. We always have to rst approximation
PROBLEM SESSION I: THE AFFINE GRASSMANNIAN TONY FENG In this problem we are proving: 1 Affine Grassmannian for G a Gr Ga = lim A n n with A n A n+1 being the inclusion of a hyperplane in the obvious way
More informationKernels of Directed Graph Laplacians. J. S. Caughman and J.J.P. Veerman
Kernels of Directed Graph Laplacians J. S. Caughman and J.J.P. Veerman Department of Mathematics and Statistics Portland State University PO Box 751, Portland, OR 97207. caughman@pdx.edu, veerman@pdx.edu
More informationSolution Set 7, Fall '12
Solution Set 7, 18.06 Fall '12 1. Do Problem 26 from 5.1. (It might take a while but when you see it, it's easy) Solution. Let n 3, and let A be an n n matrix whose i, j entry is i + j. To show that det
More informationRoots of Unity, Cyclotomic Polynomials and Applications
Swiss Mathematical Olympiad smo osm Roots of Unity, Cyclotomic Polynomials and Applications The task to be done here is to give an introduction to the topics in the title. This paper is neither complete
More informationSTA 294: Stochastic Processes & Bayesian Nonparametrics
MARKOV CHAINS AND CONVERGENCE CONCEPTS Markov chains are among the simplest stochastic processes, just one step beyond iid sequences of random variables. Traditionally they ve been used in modelling a
More informationEconomics 472. Lecture 10. where we will refer to y t as a m-vector of endogenous variables, x t as a q-vector of exogenous variables,
University of Illinois Fall 998 Department of Economics Roger Koenker Economics 472 Lecture Introduction to Dynamic Simultaneous Equation Models In this lecture we will introduce some simple dynamic simultaneous
More information6 Markov Chain Monte Carlo (MCMC)
6 Markov Chain Monte Carlo (MCMC) The underlying idea in MCMC is to replace the iid samples of basic MC methods, with dependent samples from an ergodic Markov chain, whose limiting (stationary) distribution
More informationDiscrete Applied Mathematics
Discrete Applied Mathematics 194 (015) 37 59 Contents lists available at ScienceDirect Discrete Applied Mathematics journal homepage: wwwelseviercom/locate/dam Loopy, Hankel, and combinatorially skew-hankel
More informationk-degenerate Graphs Allan Bickle Date Western Michigan University
k-degenerate Graphs Western Michigan University Date Basics Denition The k-core of a graph G is the maximal induced subgraph H G such that δ (H) k. The core number of a vertex, C (v), is the largest value
More informationACI-matrices all of whose completions have the same rank
ACI-matrices all of whose completions have the same rank Zejun Huang, Xingzhi Zhan Department of Mathematics East China Normal University Shanghai 200241, China Abstract We characterize the ACI-matrices
More informationHomework 10 Solution
CS 174: Combinatorics and Discrete Probability Fall 2012 Homewor 10 Solution Problem 1. (Exercise 10.6 from MU 8 points) The problem of counting the number of solutions to a napsac instance can be defined
More information(1) A frac = b : a, b A, b 0. We can define addition and multiplication of fractions as we normally would. a b + c d
The Algebraic Method 0.1. Integral Domains. Emmy Noether and others quickly realized that the classical algebraic number theory of Dedekind could be abstracted completely. In particular, rings of integers
More informationSTGs may contain redundant states, i.e. states whose. State minimization is the transformation of a given
Completely Specied Machines STGs may contain redundant states, i.e. states whose function can be accomplished by other states. State minimization is the transformation of a given machine into an equivalent
More informationGroups. 3.1 Definition of a Group. Introduction. Definition 3.1 Group
C H A P T E R t h r e E Groups Introduction Some of the standard topics in elementary group theory are treated in this chapter: subgroups, cyclic groups, isomorphisms, and homomorphisms. In the development
More informationContents. 2.1 Vectors in R n. Linear Algebra (part 2) : Vector Spaces (by Evan Dummit, 2017, v. 2.50) 2 Vector Spaces
Linear Algebra (part 2) : Vector Spaces (by Evan Dummit, 2017, v 250) Contents 2 Vector Spaces 1 21 Vectors in R n 1 22 The Formal Denition of a Vector Space 4 23 Subspaces 6 24 Linear Combinations and
More informationA PRIMER ON SESQUILINEAR FORMS
A PRIMER ON SESQUILINEAR FORMS BRIAN OSSERMAN This is an alternative presentation of most of the material from 8., 8.2, 8.3, 8.4, 8.5 and 8.8 of Artin s book. Any terminology (such as sesquilinear form
More informationMATRICES. a m,1 a m,n A =
MATRICES Matrices are rectangular arrays of real or complex numbers With them, we define arithmetic operations that are generalizations of those for real and complex numbers The general form a matrix of
More informationAverage Reward Parameters
Simulation-Based Optimization of Markov Reward Processes: Implementation Issues Peter Marbach 2 John N. Tsitsiklis 3 Abstract We consider discrete time, nite state space Markov reward processes which depend
More informationLinear Algebra in Actuarial Science: Slides to the lecture
Linear Algebra in Actuarial Science: Slides to the lecture Fall Semester 2010/2011 Linear Algebra is a Tool-Box Linear Equation Systems Discretization of differential equations: solving linear equations
More informationMATH 61-02: PRACTICE PROBLEMS FOR FINAL EXAM
MATH 61-02: PRACTICE PROBLEMS FOR FINAL EXAM (FP1) The exclusive or operation, denoted by and sometimes known as XOR, is defined so that P Q is true iff P is true or Q is true, but not both. Prove (through
More informationBoolean Inner-Product Spaces and Boolean Matrices
Boolean Inner-Product Spaces and Boolean Matrices Stan Gudder Department of Mathematics, University of Denver, Denver CO 80208 Frédéric Latrémolière Department of Mathematics, University of Denver, Denver
More informationCombinations. April 12, 2006
Combinations April 12, 2006 Combinations, April 12, 2006 Binomial Coecients Denition. The number of distinct subsets with j elements that can be chosen from a set with n elements is denoted by ( n j).
More informationELA
SUBDOMINANT EIGENVALUES FOR STOCHASTIC MATRICES WITH GIVEN COLUMN SUMS STEVE KIRKLAND Abstract For any stochastic matrix A of order n, denote its eigenvalues as λ 1 (A),,λ n(a), ordered so that 1 = λ 1
More informationA fast algorithm to generate necklaces with xed content
Theoretical Computer Science 301 (003) 477 489 www.elsevier.com/locate/tcs Note A fast algorithm to generate necklaces with xed content Joe Sawada 1 Department of Computer Science, University of Toronto,
More information1. The Polar Decomposition
A PERSONAL INTERVIEW WITH THE SINGULAR VALUE DECOMPOSITION MATAN GAVISH Part. Theory. The Polar Decomposition In what follows, F denotes either R or C. The vector space F n is an inner product space with
More informationSUMS PROBLEM COMPETITION, 2000
SUMS ROBLEM COMETITION, 2000 SOLUTIONS 1 The result is well known, and called Morley s Theorem Many proofs are known See for example HSM Coxeter, Introduction to Geometry, page 23 2 If the number of vertices,
More informationChapter 3. Differentiable Mappings. 1. Differentiable Mappings
Chapter 3 Differentiable Mappings 1 Differentiable Mappings Let V and W be two linear spaces over IR A mapping L from V to W is called a linear mapping if L(u + v) = Lu + Lv for all u, v V and L(λv) =
More informationCanonical lossless state-space systems: staircase forms and the Schur algorithm
Canonical lossless state-space systems: staircase forms and the Schur algorithm Ralf L.M. Peeters Bernard Hanzon Martine Olivi Dept. Mathematics School of Mathematical Sciences Projet APICS Universiteit
More informationDS-GA 1002 Lecture notes 0 Fall Linear Algebra. These notes provide a review of basic concepts in linear algebra.
DS-GA 1002 Lecture notes 0 Fall 2016 Linear Algebra These notes provide a review of basic concepts in linear algebra. 1 Vector spaces You are no doubt familiar with vectors in R 2 or R 3, i.e. [ ] 1.1
More informationA Generalized Eigenmode Algorithm for Reducible Regular Matrices over the Max-Plus Algebra
International Mathematical Forum, 4, 2009, no. 24, 1157-1171 A Generalized Eigenmode Algorithm for Reducible Regular Matrices over the Max-Plus Algebra Zvi Retchkiman Königsberg Instituto Politécnico Nacional,
More informationMATH 56A: STOCHASTIC PROCESSES CHAPTER 1
MATH 56A: STOCHASTIC PROCESSES CHAPTER. Finite Markov chains For the sake of completeness of these notes I decided to write a summary of the basic concepts of finite Markov chains. The topics in this chapter
More informationELA
Volume 16, pp 171-182, July 2007 http://mathtechnionacil/iic/ela SUBDIRECT SUMS OF DOUBLY DIAGONALLY DOMINANT MATRICES YAN ZHU AND TING-ZHU HUANG Abstract The problem of when the k-subdirect sum of a doubly
More informationOn Projective Planes
C-UPPSATS 2002:02 TFM, Mid Sweden University 851 70 Sundsvall Tel: 060-14 86 00 On Projective Planes 1 2 7 4 3 6 5 The Fano plane, the smallest projective plane. By Johan Kåhrström ii iii Abstract It was
More informationExistence of Some Signed Magic Arrays
Existence of Some Signed Magic Arrays arxiv:1701.01649v1 [math.co] 6 Jan 017 Abdollah Khodkar Department of Mathematics University of West Georgia Carrollton, GA 30118 akhodkar@westga.edu Christian Schulz
More informationSimple Lie subalgebras of locally nite associative algebras
Simple Lie subalgebras of locally nite associative algebras Y.A. Bahturin Department of Mathematics and Statistics Memorial University of Newfoundland St. John's, NL, A1C5S7, Canada A.A. Baranov Department
More informationCHANGE OF BASIS AND ALL OF THAT
CHANGE OF BASIS AND ALL OF THAT LANCE D DRAGER Introduction The goal of these notes is to provide an apparatus for dealing with change of basis in vector spaces, matrices of linear transformations, and
More informationContents. 6 Systems of First-Order Linear Dierential Equations. 6.1 General Theory of (First-Order) Linear Systems
Dierential Equations (part 3): Systems of First-Order Dierential Equations (by Evan Dummit, 26, v 2) Contents 6 Systems of First-Order Linear Dierential Equations 6 General Theory of (First-Order) Linear
More informationA proof of the Jordan normal form theorem
A proof of the Jordan normal form theorem Jordan normal form theorem states that any matrix is similar to a blockdiagonal matrix with Jordan blocks on the diagonal. To prove it, we first reformulate it
More informationInstitute for Advanced Computer Studies. Department of Computer Science. On the Convergence of. Multipoint Iterations. G. W. Stewart y.
University of Maryland Institute for Advanced Computer Studies Department of Computer Science College Park TR{93{10 TR{3030 On the Convergence of Multipoint Iterations G. W. Stewart y February, 1993 Reviseed,
More informationOn the classication of algebras
Technische Universität Carolo-Wilhelmina Braunschweig Institut Computational Mathematics On the classication of algebras Morten Wesche September 19, 2016 Introduction Higman (1950) published the papers
More informationExample: physical systems. If the state space. Example: speech recognition. Context can be. Example: epidemics. Suppose each infected
4. Markov Chains A discrete time process {X n,n = 0,1,2,...} with discrete state space X n {0,1,2,...} is a Markov chain if it has the Markov property: P[X n+1 =j X n =i,x n 1 =i n 1,...,X 0 =i 0 ] = P[X
More information[3] (b) Find a reduced row-echelon matrix row-equivalent to ,1 2 2
MATH Key for sample nal exam, August 998 []. (a) Dene the term \reduced row-echelon matrix". A matrix is reduced row-echelon if the following conditions are satised. every zero row lies below every nonzero
More information