Estimating the Number of Tables via Sequential Importance Sampling

Size: px

Start display at page:

Download "Estimating the Number of Tables via Sequential Importance Sampling"

Prosper Hutchinson
5 years ago
Views:

1 Estimating the Number of Tables via Sequential Importance Sampling Jing Xi Department of Statistics University of Kentucky Jing Xi, Ruriko Yoshida, David Haws

2 Introduction combinatorics social networks regulatory networks We consider one of the most generally used model: no three way interaction model. Model

3 Conditional Poisson Distribution Let Z = (Z,..., Z l ) be independent Bernoulli trials with probability of successes p = (p,..., p l ). Define w k = p k /( p k ) for k. Then Z follows a Conditional Poisson Distribution means, P(Z = z,..., Z l = z l S Z = c) l k= w z k k. () Z Z2 Z3 Z4 Z5 Z c=3

4 Conditional Poisson Distribution Let Z = (Z,..., Z l ) be independent Bernoulli trials with probability of successes p = (p,..., p l ). Define w k = p k /( p k ) for k. Then Z follows a Conditional Poisson Distribution means, P(Z = z,..., Z l = z l S Z = c) l k= w z k k. () Z Z2 Z3 Z4 Z5 Z c=3

5 Conditional Poisson Distribution Let Z = (Z,..., Z l ) be independent Bernoulli trials with probability of successes p = (p,..., p l ). Define w k = p k /( p k ) for k. Then Z follows a Conditional Poisson Distribution means, P(Z = z,..., Z l = z l S Z = c) l k= w z k k. () Z Z2 Z3 Z4 Z5 Z c=3

6 Conditional Poisson Distribution Let Z = (Z,..., Z l ) be independent Bernoulli trials with probability of successes p = (p,..., p l ). Define w k = p k /( p k ) for k. Then Z follows a Conditional Poisson Distribution means, P(Z = z,..., Z l = z l S Z = c) l k= w z k k. () Z Z2 Z3 Z4 Z5 Z c=3

7 Conditional Poisson Distribution Let Z = (Z,..., Z l ) be independent Bernoulli trials with probability of successes p = (p,..., p l ). Define w k = p k /( p k ) for k. Then Z follows a Conditional Poisson Distribution means, P(Z = z,..., Z l = z l S Z = c) Z l k= Z2 Z3 Z4 Z5 Z w z k k. () c=3

8 Sequential Importance Sampling (SIS) Σ, the set of all tables satisfying marginal conditions. p(x): Σ [, ], target distribution, the uniform distribution over Σ, p(x) = / Σ. q(x) > for all X Σ, the proposal distribution for sampling. We have E [ ] q(x) = X Σ which can be estimated Σ by q(x) = Σ q(x) Σ = N N i= q(x i ), where X,..., X N are tables drawn iid from q(x).

9 Sequential Importance Sampling (SIS) How to get the probability of the whole table X? Denote the columns of the table X as x,, x t. By the multiplication rule we have q(x = (x,, x t )) = q(x )q(x 2 x ) q(x t x t,..., x ). We can easily compute q(x i x i,..., x ) for i = 2, 3,..., t using Conditional Poisson distribution. What if we have rejections? Having rejections means that q(x): Σ [, ] where Σ Σ. The SIS estimator is still unbiased and consistent: [ ] IX Σ E = q(x) = Σ, q(x) q(x) X Σ I X Σ where I X Σ is an indicator function for the set Σ.

10 SIS for 2-way Table Theorem [Chen et. al., 25] For the uniform distribution over all m n - tables with given row sums r,..., r m and first column sum c, the marginal distribution of the first column is the same as the conditional distribution of Z given S Z = c with p i = r i /n

11 SIS for 2-way Table Theorem [Chen et. al., 25] For the uniform distribution over all m n - tables with given row sums r,..., r m and first column sum c, the marginal distribution of the first column is the same as the conditional distribution of Z given S Z = c with p i = r i /n

12 SIS for 2-way Table Theorem [Chen et. al., 25] For the uniform distribution over all m n - tables with given row sums r,..., r m and first column sum c, the marginal distribution of the first column is the same as the conditional distribution of Z given S Z = c with p i = r i /n

13 SIS for 2-way Table Theorem [Chen et. al., 25] For the uniform distribution over all m n - tables with given row sums r,..., r m and first column sum c, the marginal distribution of the first column is the same as the conditional distribution of Z given S Z = c with p i = r i /n. 2 qc 2

14 SIS for 2-way Table Theorem [Chen et. al., 25] For the uniform distribution over all m n - tables with given row sums r,..., r m and first column sum c, the marginal distribution of the first column is the same as the conditional distribution of Z given S Z = c with p i = r i /n. 2 qc * qc2 2

15 SIS for 2-way Tables with Structural Zero s A structural zero means a cell in the table that is fixed to be. Example Theorem [Yuguo Chen, 27] ["Hand Waving" version] Key: If (i, ) is not a structural : change p i = r i /n to p i = r i /(n g i ) where g i is the number of structural zeros in the ith row; otherwise p i =. Theorem [] n=6 [] [] p2=2/(6-2) p3= r2=2

16 SIS for 3-way Tables Theorem ["Hand Waving" version] For a cell (i, j, k ), 3 columns will go through it: (i, j, ), (i,, k ), (, j, k ). Key: When generating (i, j, ), let r = (i,, k ), c = (, j, k ), define r k = X i +k and c k = X +j k, then set: p k = r k c k r k c k + (n r k g r k )(m c k g c k ), where g r k, and g c k are the numbers of structural zeros in r and c, respectively. Theorem A similar strategy can be used in multi-way tables. Theorem

17 Algorithm Example: Semimagic Cube

18 Algorithm Example: Semimagic Cube

19 Algorithm Example: Semimagic Cube

20 Algorithm Example: Semimagic Cube

21 Algorithm Example: Semimagic Cube

22 Algorithm Example: Semimagic Cube

23 Simulations - Semimagic Cubes (Table 2) This tables lists the results from m m m tables with all marginals equals to. Dim m # tables Estimation cv 2 δ % % % e e % e e % e e % e e % δ: acceptance rate

24 Simulations - Semimagic Cubes (Table 3) We can also change marginal s. Dimension m s Estimation cv 2 δ e % e % e % e % e % e % e % e % e % δ: acceptance rate

25 Experiment - Sampson s Dataset It is a dataset about the social interactions among a group of monks recorded by Sampson. Data structure: Dimension: 8 8 Rows/Columns: the 8 monks. Levels: questions: liking (3 timepoints), disliking, esteem, disesteem, positive influenc, negative influence, praise and blame. Values: answers: 3 top choices were listed in original dataset, ranks were recorded. We set these ranks as an indicator ( if in top three choices, if not). N =, estimator is.74774e + 7, cv 2 = 62.4, acceptance rate is 3%.

26 Problems Still Open Our code performs good when marginals are close to each other. But for the opposite case, the acceptance rate can become very low. How can we reduce rejection rate? Possible idea: arrange the order of columns in different ways? How can Gale-Ryser Theorem be used for 3-way tables?

27 THANK YOU! Questions? Website for this paper:

28 Model Let X = (X ijk ) of size (m, n, l), where m, n, l N and N = {, 2,...}, be a table of counts whose entries are independent Poisson random variables with parameters, {θ ijk }. Here X ijk {, }. Consider the loglinear model, log(θ ijk ) = λ + λ M i + λ N j + λ L k + λmn ij + λ ML ik + λ NL jk (2) for i =,..., m, j =,..., n, and k =,..., l where M, N, and L denote the nominal-scale factors. This model is called no three-way interaction model. Notice that the sufficient statistics under the model in (2) are the two-way marginals. Back

29 Example for Structural Zero s How can structural zero s come? Different types of cancer separated by gender for Alaska in year 989: Type of cancer Female Male Total Lung Melanoma Ovarian 8 [] 8 Prostate [] Stomach 5 5 Total The structural zeros s are denoted by "[]" Back

30 Theorem [2-way Tables with Structural Zero s] Define the set of structural zeros Ω as: Ω = {(i, j) : (i, j) is a structural zero, i =,..., m, j =,..., n} Theorem [Yuguo Chen, 27] For the uniform distribution over all m n - tables with given row sums r,..., r m, first column sum c, and the set of structural zeros Ω, the marginal distribution of the first column is the same as the conditional distribution of Z given S Z = c with p i = I [(i,)/ Ω] r i /(n g i ) where g i is the number of structural zeros in the ith row. Back

31 Theorem [SIS for 3-way Tables] Theorem For the uniform distribution over all m n l - tables with structural zeros with given marginals r k = X i +k, c k = X +j k for k =, 2,..., l, and a fixed marginal for the factor L, l, the marginal distribution of the fixed marginal l is the same as the conditional distribution of Z given S Z = l with p k := r k c k r k c k + (n r k g r k )(m c k g c k ), where g r k is the number of structural zeros in the (r, k)th column and g c is the number of structural zeros in the (c, k)th column. k Back

32 Theorem [SIS for Multi-way Tables] Theorem For the uniform distribution over all d-way - contingency tables X = (X i...i d ) of size (n n d ), where n i N for i =,... d with marginals l = X i,...id +, and r j k = X i...i j +i j+...i d k for k {,..., n d }, the marginal distribution of the fixed marginal l is the same as the conditional distribution of Z given S Z = l with p k := d j= r j k d j= r j k + d j= (n j r j k gj k ) where g j k is the number of structural zeros in the (i,..., i j, i j+,..., i d, k)th column of X. Back

Estimating the number of zero-one multi-way tables via sequential importance sampling

Ann Inst Stat Math (2013) 65:763 783 DOI 10.1007/s10463-012-0392-7 Estimating the number of zero-one multi-way tables via sequential importance sampling Jing Xi Rurio Yoshida David Haws Received: 1 May