Computing Bi-Clusters for Microarray Analysis

Size: px
Start display at page:

Download "Computing Bi-Clusters for Microarray Analysis"

Transcription

1 December 21, 2006

2 General Bi-clustering Problem Similarity of Rows (1-5) Cheng and Churchs model General Bi-clustering Problem Input: a n m matrix A. Output: a sub-matrix A P,Q of A such that the rows of A P,Q are similar. That is, all the rows are identical. Why sub-matrix? A subset of genes are co-regulated and co-expressed under specific conditions. It is interesting to find the subsets of genes and conditions.

3 General Bi-clustering Problem Similarity of Rows (1-5) Cheng and Churchs model Similarity of Rows (1-5) 1. All rows are identical All the elements in a row are identical (the same as 1 if we treat columns as rows)

4 General Bi-clustering Problem Similarity of Rows (1-5) Cheng and Churchs model Similarity of Rows (1-5) 3. The curves for all rows are similar (additive) a i,j a i,k = c(j, k) for i = 1, 2,..., m. Case 3 is equivalent to case 2 (thus also case 1) if we construct a new matrix a i,j = a i,j a i,p for a fixed p indicate a row.

5 General Bi-clustering Problem Similarity of Rows (1-5) Cheng and Churchs model Similarity of Rows (1-5) 4. The curves for all rows are similar (multiplicative) a 1,1 a 1,2 a 1,3... a 1,m c 1 a 1,1 c 1 a 1,2 c 1 a 1,3... c 1 a 1,m c 2 a 1,1 c 2 a 1,2 c 2 a 1,3... c 2 a 1,m... c n a 1,1 c n a 1,2 c n a 1,3... c n a 1,m Transfer to case 2 (thus case 1) by taking log and subtraction. Case 3 and Case 4 are called bi-clusters with coherent values.

6 General Bi-clustering Problem Similarity of Rows (1-5) Cheng and Churchs model Similarity of Rows (1-5) 5. The curves for all rows are similar (multiplicative and additive) a i,j = c i a k,j + d i Transfer to case 2 (thus case 1) by subtraction of a fixed row (row i), taking log and subtraction of row i again. The basic model: All the rows in the sub-matrix are identical.

7 General Bi-clustering Problem Similarity of Rows (1-5) Cheng and Churchs model Cheng and Churchs model The model introduced a similarity score called the mean squared residue score H to measure the coherence of the rows and columns in the submatrix. where H(P, Q) = 1 P Q a i,q = 1 Q j Q i P,j Q a i,j, a P,j = 1 P (a i,j a i,q a P,j + a P,Q ) 2 i P a i,j, a P,Q = 1 P Q i P,j Q If there is no error, H(P, Q)=0 for case 1, 2 and 3. A lot of heuristics (programs) have been produced. a i,j.

8 Consensus Sub-matrix Problem Bottleneck Sub-matrix Problem NP-Hardness Results Consensus Sub-matrix Problem Bottleneck Sub-matrix Problem

9 Consensus Sub-matrix Problem Bottleneck Sub-matrix Problem NP-Hardness Results Consensus Sub-matrix Problem Input: a n m matrix A, integers l and k. Output: a sub-matrix A P,Q of A with l rows and k columns and a consensus row z (of k elements) such that r i P d(r i Q, z) is minimized. Here d(, ) is the Hamming distance.

10 Consensus Sub-matrix Problem Bottleneck Sub-matrix Problem NP-Hardness Results Bottleneck Sub-matrix Problem Input: a n m matrix A, integers l and k. Output: a sub-matrix A P,Q of A with l rows and k columns and a consensus row z (of k elements) such that for any r i in P d(r i Q, z) d and d is minimized Here d(, ) is the Hamming distance.

11 Consensus Sub-matrix Problem Bottleneck Sub-matrix Problem NP-Hardness Results NP-Hardness Results Theorem 1: Both consensus sub-matrix and bottleneck sub-matrix problems are NP-hard. Proof: We use a reduction from maximum edge bipartite problem.

12 for Consensus Sub-matrix Problem Input: a n m matrix A, integers l and k. Output: a sub-matrix A P,Q of A with l rows and k columns and a consensus row z (of k elements) such that r i P d(r i Q, z) is minimized. Here d(, ) is the Hamming distance.

13 Assumptions: H opt = p i P opt d(x pi Qopt, z opt) = O(kl), H opt c = kl and Q opt = k = O(n), k c = n. : We use a random sampling technique to randomly select O(logm) columns in Q opt, enumerate all possible vectors of length O(logm) for those columns. At some moment, we know O(logm) bits of r opt and we can use the partial z opt to select the l rows which are closest to z opt in those O(logm) bits. After that we can construct a consensus vector r as follows: for each column, choose the (majority) letter that appears the most in each of the l letters in the l selected rows. Then for each of the n columns, we can calculate the number of mismatches between the majority letter and the l letters in the l selected rows. By selecting the best k columns, we can get a good solution.

14 How to randomly select O(logm) columns in Q opt while Q opt is unknown? Our new idea is to randomly select a set B of (c + 1)logm columns from A and enumerate all size logm subsets of B in time O(m c+1) which is polynomial in terms of the input size O(mn). We can show that with high probability, we can get a set of logm columns randomly selected from Q opt.

15 Algorithm 1 for The Consensus Submatrix Problem Input: one m n matrix A, integers l and k, and ɛ > 0 Output: a size l subset P of rows, a size k subset Q of columns and a length k consensus vector z Step 1: randomly select a set B of (c + 1)( 4 log m + 1) columns from A. ɛ 2 (1.1) for every size 4 log m ɛ 2 subset R of B do (1.2) for every z R Σ R do (a) Select the best l rows P = {p 1,..., p l } that minimize d(z R, x i R ). (b) for each column j do Compute f (j) = l i=1 d(s j, a pi,j), where s j is the majority element of the l rows in P in column j. Select the best k columns Q = {q 1,..., q k } with minimum value f (j) and let z(q) = s q1 s q2... s qk. (c) Calculate H = l i=1 d(xp i Q, z) of this solution. Step 2: Output P, Q and z with minimum H.

16 Lemma 1: With probability at most m 2 ɛ 2 c 2 (c+1), no subset R of size 4 log m used in Step 1 of Algorithm 1 satisfies ɛ 2 R Q opt. Lemma 2: Assume R = 4 log m and R Q ɛ 2 opt. Let ρ = k R. With probability at most m 1, there is a row x i in X satisfying d(z opt, x i Qopt ) ɛk ρ > d(z opt R, x i R ). With probability at most m 1 3, there is a row x i in X satisfying d(z opt R, x i R ) > d(z opt, x i Qopt ) + ɛk. ρ

17 Lemma 3: When R Q opt and z R = z opt R, with probability at most 2m 1 3, the set of rows P = {p 1,..., p l } selected in Step 1 (a) of Algorithm 1 satisfies l i=1 d(z opt, x pi Qopt ) > H opt + 2ɛkl. Theorem 2: For any δ > 0, with probability at least 2 1 m 8c δ 2 c 2 (c+1) 2m 1 3, Algorithm 1 will output a solution with consensus score at most (1 + δ)h opt in O(nm O( 1 δ 2 ) ) time.

18 Thanks for Bottleneck Sub-matrix Problem Input: a n m matrix A, integers l and k. Output: a sub-matrix A P,Q of A with l rows and k columns and a consensus row z (of k elements) such that for any r i in P d(r i Q, z) d and d is minimized Here d(, ) is the Hamming distance.

19 Thanks Assumptions: d opt = MAX pi P opt d(x pi Qopt, z opt ) = O(k), d opt c = k and Q opt = k = O(n), k c = n. : (1) Use random sampling technique to know O(logm) bits of z opt and select l best rows like Algorithm 1. (2) Use linear programming and randomized rounding technique to select k columns in the matrix.

20 Thanks Linear programming Given a set of rows P = {p 1,..., p l }, we want to find a set of k columns Q and vector z such that bottleneck score is minimized. min d; Σ n y i,j = k, i=1 Σ j=1 j=1 y i,j 1, i = 1, 2,..., n, Σ n χ(π j, x ps,i)y i,j d, s = 1, 2,..., l. i=1 j=1 y i,j = 1 if and only if column i is in Q and the corresponding bit in z is π j. Here, for any a, b Σ, χ(a, b) = 0 if a = b and χ(a, b) = 1 if a b.

21 Thanks Randomized rounding To achieve two goals: (1) Select k columns, where k k δd opt. (2) Get integers values for y i,j such that the distance (restricted on the k selected columns) between any row in P and the center vector thus obtained is at most (1 + γ)d opt. Here δ > 0 and γ > 0 are two parameters used to control the errors.

22 Thanks Lemma 4: When nγ 2 3(cc ) 2 2 log m, for any γ, δ > 0, with probability at most exp( nδ2 ) + m 1, the rounding result 2(cc ) 2 y = {y 1,1,..., y 1, Σ,..., y n,1,..., y n, Σ } does not satisfy at least one of the following inequalities, Σ n ( y i,j) > k δd opt, i=1 j=1 and for every row x ps (s = 1, 2,..., l), Σ n ( χ(π j, x ps,i)y i,j) < d + γd opt. i=1 j=1

23 Thanks Algorithm 2 for The bottleneck Sub-matrix Problem Input: one matrix A Σ m n, integer l, k, a row z Σ n and small numbers ɛ > 0, γ > 0 and δ > 0. Output: a size l subset P of rows, a size k subset Q of columns and a length k consensus vector z. nγ 2 3(cc ) 2 if 2 log m then try all size k subset Q of the n columns and all z of length k to solve the problem. if > 2 log m then nγ 2 3(cc ) 2 Step 1: randomly select a set B of size subset R of B do 4 log m ɛ 2 4(c+1) log m columns from A. for every ɛ 2 for every z R Σ R do (a) Select the best l rows P = {p 1,..., p l } that minimize d(z R, x i R ). (b)solve the optimization problem by linear programming and randomized rounding to get Q and z. Step 2: Output P,Q and z with minimum bottleneck score d.

24 Thanks Lemma 5: When R Q opt and z R = z opt R, with probability at most 2m 1 3, the set of rows P = {p 1,..., p l } obtained in Step 1(a) of Algorithm 2 satisfies d(z opt, x pi Qopt ) > d opt + 2ɛk for some row x pi (1 i l). Theorem 3: With probability at least 1 m 2 ɛ 2 c 2 (c+1) 2m 1 3 exp( nδ2 ) m 1, Algorithm 2 2(cc ) 2 runs in time O(n O(1) m O( 1 ɛ γ 2 ) ) and obtains a solution with bottleneck score at most (1 + 2c ɛ + γ + δ)d opt for any fixed ɛ, γ, δ > 0.

25 Thanks Thanks Acknowledgements This work is fully supported by a grant from the Research Grants Council of the Hong Kong Special Administrative Region, China [Project No. CityU 1070/02E]. This work is collaborated with Dr. Lusheng Wang and Xiaowen Liu in City University of Hong Kong, Hong Kong, China.

26 Thanks Let X 1, X 2,..., X n be n independent random 0-1 variables, where X i takes 1 with probability p i, 0 < p i < 1. Let X = n i=1 X i, and µ = E[X ]. Then for any 0 < ɛ 1, Pr(X > µ + ɛ n) < e 1 3 nɛ2, Pr(X < µ ɛ n) e 1 2 nɛ2.

Closest String and Closest Substring Problems

Closest String and Closest Substring Problems January 8, 2010 Problem Formulation Problem Statement I Closest String Given a set S = {s 1, s 2,, s n } of strings each length m, find a center string s of length m minimizing d such that for every string

More information

ACO Comprehensive Exam October 14 and 15, 2013

ACO Comprehensive Exam October 14 and 15, 2013 1. Computability, Complexity and Algorithms (a) Let G be the complete graph on n vertices, and let c : V (G) V (G) [0, ) be a symmetric cost function. Consider the following closest point heuristic for

More information

Fisher Equilibrium Price with a Class of Concave Utility Functions

Fisher Equilibrium Price with a Class of Concave Utility Functions Fisher Equilibrium Price with a Class of Concave Utility Functions Ning Chen 1, Xiaotie Deng 2, Xiaoming Sun 3, and Andrew Chi-Chih Yao 4 1 Department of Computer Science & Engineering, University of Washington

More information

Bounds for pairs in partitions of graphs

Bounds for pairs in partitions of graphs Bounds for pairs in partitions of graphs Jie Ma Xingxing Yu School of Mathematics Georgia Institute of Technology Atlanta, GA 30332-0160, USA Abstract In this paper we study the following problem of Bollobás

More information

CS6999 Probabilistic Methods in Integer Programming Randomized Rounding Andrew D. Smith April 2003

CS6999 Probabilistic Methods in Integer Programming Randomized Rounding Andrew D. Smith April 2003 CS6999 Probabilistic Methods in Integer Programming Randomized Rounding April 2003 Overview 2 Background Randomized Rounding Handling Feasibility Derandomization Advanced Techniques Integer Programming

More information

Let S be a set of n species. A phylogeny is a rooted tree with n leaves, each of which is uniquely

Let S be a set of n species. A phylogeny is a rooted tree with n leaves, each of which is uniquely JOURNAL OF COMPUTATIONAL BIOLOGY Volume 8, Number 1, 2001 Mary Ann Liebert, Inc. Pp. 69 78 Perfect Phylogenetic Networks with Recombination LUSHENG WANG, 1 KAIZHONG ZHANG, 2 and LOUXIN ZHANG 3 ABSTRACT

More information

THE SZEMERÉDI REGULARITY LEMMA AND ITS APPLICATION

THE SZEMERÉDI REGULARITY LEMMA AND ITS APPLICATION THE SZEMERÉDI REGULARITY LEMMA AND ITS APPLICATION YAQIAO LI In this note we will prove Szemerédi s regularity lemma, and its application in proving the triangle removal lemma and the Roth s theorem on

More information

On the k-closest Substring and k-consensus Pattern Problems

On the k-closest Substring and k-consensus Pattern Problems On the k-closest Substring and k-consensus Pattern Problems Yishan Jiao 1,JingyiXu 1, and Ming Li 2 1 Bioinformatics lab, Institute of Computing Technology, Chinese Academy of Sciences, 6#, South Road,

More information

Distances and similarities Based in part on slides from textbook, slides of Susan Holmes. October 3, Statistics 202: Data Mining

Distances and similarities Based in part on slides from textbook, slides of Susan Holmes. October 3, Statistics 202: Data Mining Distances and similarities Based in part on slides from textbook, slides of Susan Holmes October 3, 2012 1 / 1 Similarities Start with X which we assume is centered and standardized. The PCA loadings were

More information

x 1 + x 2 2 x 1 x 2 1 x 2 2 min 3x 1 + 2x 2

x 1 + x 2 2 x 1 x 2 1 x 2 2 min 3x 1 + 2x 2 Lecture 1 LPs: Algebraic View 1.1 Introduction to Linear Programming Linear programs began to get a lot of attention in 1940 s, when people were interested in minimizing costs of various systems while

More information

6.842 Randomness and Computation Lecture 5

6.842 Randomness and Computation Lecture 5 6.842 Randomness and Computation 2012-02-22 Lecture 5 Lecturer: Ronitt Rubinfeld Scribe: Michael Forbes 1 Overview Today we will define the notion of a pairwise independent hash function, and discuss its

More information

Problem Set 1: Solutions Math 201A: Fall Problem 1. Let (X, d) be a metric space. (a) Prove the reverse triangle inequality: for every x, y, z X

Problem Set 1: Solutions Math 201A: Fall Problem 1. Let (X, d) be a metric space. (a) Prove the reverse triangle inequality: for every x, y, z X Problem Set 1: s Math 201A: Fall 2016 Problem 1. Let (X, d) be a metric space. (a) Prove the reverse triangle inequality: for every x, y, z X d(x, y) d(x, z) d(z, y). (b) Prove that if x n x and y n y

More information

Non-Interactive Zero Knowledge (II)

Non-Interactive Zero Knowledge (II) Non-Interactive Zero Knowledge (II) CS 601.442/642 Modern Cryptography Fall 2017 S 601.442/642 Modern CryptographyNon-Interactive Zero Knowledge (II) Fall 2017 1 / 18 NIZKs for NP: Roadmap Last-time: Transformation

More information

Matrices: 2.1 Operations with Matrices

Matrices: 2.1 Operations with Matrices Goals In this chapter and section we study matrix operations: Define matrix addition Define multiplication of matrix by a scalar, to be called scalar multiplication. Define multiplication of two matrices,

More information

On the complexity of unsigned translocation distance

On the complexity of unsigned translocation distance Theoretical Computer Science 352 (2006) 322 328 Note On the complexity of unsigned translocation distance Daming Zhu a, Lusheng Wang b, a School of Computer Science and Technology, Shandong University,

More information

Fast Algorithms for Constant Approximation k-means Clustering

Fast Algorithms for Constant Approximation k-means Clustering Transactions on Machine Learning and Data Mining Vol. 3, No. 2 (2010) 67-79 c ISSN:1865-6781 (Journal), ISBN: 978-3-940501-19-6, IBaI Publishing ISSN 1864-9734 Fast Algorithms for Constant Approximation

More information

Approximation Algorithms

Approximation Algorithms Approximation Algorithms What do you do when a problem is NP-complete? or, when the polynomial time solution is impractically slow? assume input is random, do expected performance. Eg, Hamiltonian path

More information

The Algorithmic Aspects of the Regularity Lemma

The Algorithmic Aspects of the Regularity Lemma The Algorithmic Aspects of the Regularity Lemma N. Alon R. A. Duke H. Lefmann V. Rödl R. Yuster Abstract The Regularity Lemma of Szemerédi is a result that asserts that every graph can be partitioned in

More information

On approximation of real, complex, and p-adic numbers by algebraic numbers of bounded degree. by K. I. Tsishchanka

On approximation of real, complex, and p-adic numbers by algebraic numbers of bounded degree. by K. I. Tsishchanka On approximation of real, complex, and p-adic numbers by algebraic numbers of bounded degree by K. I. Tsishchanka I. On approximation by rational numbers Theorem 1 (Dirichlet, 1842). For any real irrational

More information

Topics in Approximation Algorithms Solution for Homework 3

Topics in Approximation Algorithms Solution for Homework 3 Topics in Approximation Algorithms Solution for Homework 3 Problem 1 We show that any solution {U t } can be modified to satisfy U τ L τ as follows. Suppose U τ L τ, so there is a vertex v U τ but v L

More information

Reductions. Reduction. Linear Time Reduction: Examples. Linear Time Reductions

Reductions. Reduction. Linear Time Reduction: Examples. Linear Time Reductions Reduction Reductions Problem X reduces to problem Y if given a subroutine for Y, can solve X. Cost of solving X = cost of solving Y + cost of reduction. May call subroutine for Y more than once. Ex: X

More information

1 The Knapsack Problem

1 The Knapsack Problem Comp 260: Advanced Algorithms Prof. Lenore Cowen Tufts University, Spring 2018 Scribe: Tom Magerlein 1 Lecture 4: The Knapsack Problem 1 The Knapsack Problem Suppose we are trying to burgle someone s house.

More information

Approximate Counting and Markov Chain Monte Carlo

Approximate Counting and Markov Chain Monte Carlo Approximate Counting and Markov Chain Monte Carlo A Randomized Approach Arindam Pal Department of Computer Science and Engineering Indian Institute of Technology Delhi March 18, 2011 April 8, 2011 Arindam

More information

A conjecture on the alphabet size needed to produce all correlation classes of pairs of words

A conjecture on the alphabet size needed to produce all correlation classes of pairs of words A conjecture on the alphabet size needed to produce all correlation classes of pairs of words Paul Leopardi Thanks: Jörg Arndt, Michael Barnsley, Richard Brent, Sylvain Forêt, Judy-anne Osborn. Mathematical

More information

ICALP g Mingji Xia

ICALP g Mingji Xia : : ICALP 2010 g Institute of Software Chinese Academy of Sciences 2010.7.6 : 1 2 3 Matrix multiplication : Product of vectors and matrices. AC = e [n] a ec e A = (a e), C = (c e). ABC = e,e [n] a eb ee

More information

Integer Factorization using lattices

Integer Factorization using lattices Integer Factorization using lattices Antonio Vera INRIA Nancy/CARAMEL team/anr CADO/ANR LAREDA Workshop Lattice Algorithmics - CIRM - February 2010 Plan Introduction Plan Introduction Outline of the algorithm

More information

Statistics and data analyses

Statistics and data analyses Statistics and data analyses Designing experiments Measuring time Instrumental quality Precision Standard deviation depends on Number of measurements Detection quality Systematics and methology σ tot =

More information

Statistics 202: Data Mining. c Jonathan Taylor. Week 2 Based in part on slides from textbook, slides of Susan Holmes. October 3, / 1

Statistics 202: Data Mining. c Jonathan Taylor. Week 2 Based in part on slides from textbook, slides of Susan Holmes. October 3, / 1 Week 2 Based in part on slides from textbook, slides of Susan Holmes October 3, 2012 1 / 1 Part I Other datatypes, preprocessing 2 / 1 Other datatypes Document data You might start with a collection of

More information

Part I. Other datatypes, preprocessing. Other datatypes. Other datatypes. Week 2 Based in part on slides from textbook, slides of Susan Holmes

Part I. Other datatypes, preprocessing. Other datatypes. Other datatypes. Week 2 Based in part on slides from textbook, slides of Susan Holmes Week 2 Based in part on slides from textbook, slides of Susan Holmes Part I Other datatypes, preprocessing October 3, 2012 1 / 1 2 / 1 Other datatypes Other datatypes Document data You might start with

More information

Factoring Algorithms Pollard s p 1 Method. This method discovers a prime factor p of an integer n whenever p 1 has only small prime factors.

Factoring Algorithms Pollard s p 1 Method. This method discovers a prime factor p of an integer n whenever p 1 has only small prime factors. Factoring Algorithms Pollard s p 1 Method This method discovers a prime factor p of an integer n whenever p 1 has only small prime factors. Input: n (to factor) and a limit B Output: a proper factor of

More information

SZEMERÉDI S REGULARITY LEMMA FOR MATRICES AND SPARSE GRAPHS

SZEMERÉDI S REGULARITY LEMMA FOR MATRICES AND SPARSE GRAPHS SZEMERÉDI S REGULARITY LEMMA FOR MATRICES AND SPARSE GRAPHS ALEXANDER SCOTT Abstract. Szemerédi s Regularity Lemma is an important tool for analyzing the structure of dense graphs. There are versions of

More information

Approximation Algorithms and Hardness of Approximation. IPM, Jan Mohammad R. Salavatipour Department of Computing Science University of Alberta

Approximation Algorithms and Hardness of Approximation. IPM, Jan Mohammad R. Salavatipour Department of Computing Science University of Alberta Approximation Algorithms and Hardness of Approximation IPM, Jan 2006 Mohammad R. Salavatipour Department of Computing Science University of Alberta 1 Introduction For NP-hard optimization problems, we

More information

A Polytope for a Product of Real Linear Functions in 0/1 Variables

A Polytope for a Product of Real Linear Functions in 0/1 Variables Don Coppersmith Oktay Günlük Jon Lee Janny Leung A Polytope for a Product of Real Linear Functions in 0/1 Variables Original: 29 September 1999 as IBM Research Report RC21568 Revised version: 30 November

More information

Lecture 2. MATH3220 Operations Research and Logistics Jan. 8, Pan Li The Chinese University of Hong Kong. Integer Programming Formulations

Lecture 2. MATH3220 Operations Research and Logistics Jan. 8, Pan Li The Chinese University of Hong Kong. Integer Programming Formulations Lecture 2 MATH3220 Operations Research and Logistics Jan. 8, 2015 Pan Li The Chinese University of Hong Kong 2.1 Agenda 1 2 3 2.2 : a linear program plus the additional constraints that some or all of

More information

Lecture 5: Two-point Sampling

Lecture 5: Two-point Sampling Randomized Algorithms Lecture 5: Two-point Sampling Sotiris Nikoletseas Professor CEID - ETY Course 2017-2018 Sotiris Nikoletseas, Professor Randomized Algorithms - Lecture 5 1 / 26 Overview A. Pairwise

More information

Improved Parallel Approximation of a Class of Integer Programming Problems

Improved Parallel Approximation of a Class of Integer Programming Problems Improved Parallel Approximation of a Class of Integer Programming Problems Noga Alon 1 and Aravind Srinivasan 2 1 School of Mathematical Sciences, Raymond and Beverly Sackler Faculty of Exact Sciences,

More information

Math 1302 Notes 2. How many solutions? What type of solution in the real number system? What kind of equation is it?

Math 1302 Notes 2. How many solutions? What type of solution in the real number system? What kind of equation is it? Math 1302 Notes 2 We know that x 2 + 4 = 0 has How many solutions? What type of solution in the real number system? What kind of equation is it? What happens if we enlarge our current system? Remember

More information

Signal Recovery from Permuted Observations

Signal Recovery from Permuted Observations EE381V Course Project Signal Recovery from Permuted Observations 1 Problem Shanshan Wu (sw33323) May 8th, 2015 We start with the following problem: let s R n be an unknown n-dimensional real-valued signal,

More information

Probability and Samples. Sampling. Point Estimates

Probability and Samples. Sampling. Point Estimates Probability and Samples Sampling We want the results from our sample to be true for the population and not just the sample But our sample may or may not be representative of the population Sampling error

More information

ON COST MATRICES WITH TWO AND THREE DISTINCT VALUES OF HAMILTONIAN PATHS AND CYCLES

ON COST MATRICES WITH TWO AND THREE DISTINCT VALUES OF HAMILTONIAN PATHS AND CYCLES ON COST MATRICES WITH TWO AND THREE DISTINCT VALUES OF HAMILTONIAN PATHS AND CYCLES SANTOSH N. KABADI AND ABRAHAM P. PUNNEN Abstract. Polynomially testable characterization of cost matrices associated

More information

Analysis and Design of Algorithms Dynamic Programming

Analysis and Design of Algorithms Dynamic Programming Analysis and Design of Algorithms Dynamic Programming Lecture Notes by Dr. Wang, Rui Fall 2008 Department of Computer Science Ocean University of China November 6, 2009 Introduction 2 Introduction..................................................................

More information

Integer Linear Programs

Integer Linear Programs Lecture 2: Review, Linear Programming Relaxations Today we will talk about expressing combinatorial problems as mathematical programs, specifically Integer Linear Programs (ILPs). We then see what happens

More information

RANK AND PERIMETER PRESERVER OF RANK-1 MATRICES OVER MAX ALGEBRA

RANK AND PERIMETER PRESERVER OF RANK-1 MATRICES OVER MAX ALGEBRA Discussiones Mathematicae General Algebra and Applications 23 (2003 ) 125 137 RANK AND PERIMETER PRESERVER OF RANK-1 MATRICES OVER MAX ALGEBRA Seok-Zun Song and Kyung-Tae Kang Department of Mathematics,

More information

HEURISTICS FOR TWO-MACHINE FLOWSHOP SCHEDULING WITH SETUP TIMES AND AN AVAILABILITY CONSTRAINT

HEURISTICS FOR TWO-MACHINE FLOWSHOP SCHEDULING WITH SETUP TIMES AND AN AVAILABILITY CONSTRAINT HEURISTICS FOR TWO-MACHINE FLOWSHOP SCHEDULING WITH SETUP TIMES AND AN AVAILABILITY CONSTRAINT Wei Cheng Health Monitor Network, Parmus, NJ John Karlof Department of Mathematics and Statistics University

More information

Technische Universität München, Zentrum Mathematik Lehrstuhl für Angewandte Geometrie und Diskrete Mathematik. Combinatorial Optimization (MA 4502)

Technische Universität München, Zentrum Mathematik Lehrstuhl für Angewandte Geometrie und Diskrete Mathematik. Combinatorial Optimization (MA 4502) Technische Universität München, Zentrum Mathematik Lehrstuhl für Angewandte Geometrie und Diskrete Mathematik Combinatorial Optimization (MA 4502) Dr. Michael Ritter Problem Sheet 1 Homework Problems Exercise

More information

Zero-sum square matrices

Zero-sum square matrices Zero-sum square matrices Paul Balister Yair Caro Cecil Rousseau Raphael Yuster Abstract Let A be a matrix over the integers, and let p be a positive integer. A submatrix B of A is zero-sum mod p if the

More information

Random Variable. Pr(X = a) = Pr(s)

Random Variable. Pr(X = a) = Pr(s) Random Variable Definition A random variable X on a sample space Ω is a real-valued function on Ω; that is, X : Ω R. A discrete random variable is a random variable that takes on only a finite or countably

More information

A Note on the Density of the Multiple Subset Sum Problems

A Note on the Density of the Multiple Subset Sum Problems A Note on the Density of the Multiple Subset Sum Problems Yanbin Pan and Feng Zhang Key Laboratory of Mathematics Mechanization, Academy of Mathematics and Systems Science, Chinese Academy of Sciences,

More information

Scheduling on Unrelated Parallel Machines. Approximation Algorithms, V. V. Vazirani Book Chapter 17

Scheduling on Unrelated Parallel Machines. Approximation Algorithms, V. V. Vazirani Book Chapter 17 Scheduling on Unrelated Parallel Machines Approximation Algorithms, V. V. Vazirani Book Chapter 17 Nicolas Karakatsanis, 2008 Description of the problem Problem 17.1 (Scheduling on unrelated parallel machines)

More information

CS 372: Computational Geometry Lecture 14 Geometric Approximation Algorithms

CS 372: Computational Geometry Lecture 14 Geometric Approximation Algorithms CS 372: Computational Geometry Lecture 14 Geometric Approximation Algorithms Antoine Vigneron King Abdullah University of Science and Technology December 5, 2012 Antoine Vigneron (KAUST) CS 372 Lecture

More information

Diameter of the Zero Divisor Graph of Semiring of Matrices over Boolean Semiring

Diameter of the Zero Divisor Graph of Semiring of Matrices over Boolean Semiring International Mathematical Forum, Vol. 9, 2014, no. 29, 1369-1375 HIKARI Ltd, www.m-hikari.com http://dx.doi.org/10.12988/imf.2014.47131 Diameter of the Zero Divisor Graph of Semiring of Matrices over

More information

Lecture Semidefinite Programming and Graph Partitioning

Lecture Semidefinite Programming and Graph Partitioning Approximation Algorithms and Hardness of Approximation April 16, 013 Lecture 14 Lecturer: Alantha Newman Scribes: Marwa El Halabi 1 Semidefinite Programming and Graph Partitioning In previous lectures,

More information

Lecture 9 Nearest Neighbor Search: Locality Sensitive Hashing.

Lecture 9 Nearest Neighbor Search: Locality Sensitive Hashing. COMS 4995-3: Advanced Algorithms Feb 15, 2017 Lecture 9 Nearest Neighbor Search: Locality Sensitive Hashing. Instructor: Alex Andoni Scribes: Weston Jackson, Edo Roth 1 Introduction Today s lecture is

More information

Algorithms for Molecular Biology

Algorithms for Molecular Biology Algorithms for Molecular Biology BioMed Central Research A polynomial time biclustering algorithm for finding approximate expression patterns in gene expression time series Sara C Madeira* 1,2,3 and Arlindo

More information

Assignment 2 : Probabilistic Methods

Assignment 2 : Probabilistic Methods Assignment 2 : Probabilistic Methods jverstra@math Question 1. Let d N, c R +, and let G n be an n-vertex d-regular graph. Suppose that vertices of G n are selected independently with probability p, where

More information

Matrices. Chapter Definitions and Notations

Matrices. Chapter Definitions and Notations Chapter 3 Matrices 3. Definitions and Notations Matrices are yet another mathematical object. Learning about matrices means learning what they are, how they are represented, the types of operations which

More information

Separation Techniques for Constrained Nonlinear 0 1 Programming

Separation Techniques for Constrained Nonlinear 0 1 Programming Separation Techniques for Constrained Nonlinear 0 1 Programming Christoph Buchheim Computer Science Department, University of Cologne and DEIS, University of Bologna MIP 2008, Columbia University, New

More information

Proofs Not Based On POMI

Proofs Not Based On POMI s Not Based On POMI James K. Peterson Department of Biological Sciences and Department of Mathematical Sciences Clemson University February 1, 018 Outline Non POMI Based s Some Contradiction s Triangle

More information

STABILITY OF JOHNSON S SCHEDULE WITH LIMITED MACHINE AVAILABILITY

STABILITY OF JOHNSON S SCHEDULE WITH LIMITED MACHINE AVAILABILITY MOSIM 01 du 25 au 27 avril 2001 Troyes (France) STABILITY OF JOHNSON S SCHEDULE WITH LIMITED MACHINE AVAILABILITY Oliver BRAUN, Günter SCHMIDT Department of Information and Technology Management Saarland

More information

Handout 5. α a1 a n. }, where. xi if a i = 1 1 if a i = 0.

Handout 5. α a1 a n. }, where. xi if a i = 1 1 if a i = 0. Notes on Complexity Theory Last updated: October, 2005 Jonathan Katz Handout 5 1 An Improved Upper-Bound on Circuit Size Here we show the result promised in the previous lecture regarding an upper-bound

More information

Data Mining and Matrices

Data Mining and Matrices Data Mining and Matrices 08 Boolean Matrix Factorization Rainer Gemulla, Pauli Miettinen June 13, 2013 Outline 1 Warm-Up 2 What is BMF 3 BMF vs. other three-letter abbreviations 4 Binary matrices, tiles,

More information

Approximation Schemes for Job Shop Scheduling Problems with Controllable Processing Times

Approximation Schemes for Job Shop Scheduling Problems with Controllable Processing Times Approximation Schemes for Job Shop Scheduling Problems with Controllable Processing Times Klaus Jansen 1, Monaldo Mastrolilli 2, and Roberto Solis-Oba 3 1 Universität zu Kiel, Germany, kj@informatik.uni-kiel.de

More information

LP based heuristics for the multiple knapsack problem. with assignment restrictions

LP based heuristics for the multiple knapsack problem. with assignment restrictions LP based heuristics for the multiple knapsack problem with assignment restrictions Geir Dahl and Njål Foldnes Centre of Mathematics for Applications and Department of Informatics, University of Oslo, P.O.Box

More information

CONSTRUCTION OF SLICED SPACE-FILLING DESIGNS BASED ON BALANCED SLICED ORTHOGONAL ARRAYS

CONSTRUCTION OF SLICED SPACE-FILLING DESIGNS BASED ON BALANCED SLICED ORTHOGONAL ARRAYS Statistica Sinica 24 (2014), 1685-1702 doi:http://dx.doi.org/10.5705/ss.2013.239 CONSTRUCTION OF SLICED SPACE-FILLING DESIGNS BASED ON BALANCED SLICED ORTHOGONAL ARRAYS Mingyao Ai 1, Bochuan Jiang 1,2

More information

EE512: Error Control Coding

EE512: Error Control Coding EE512: Error Control Coding Solution for Assignment on Linear Block Codes February 14, 2007 1. Code 1: n = 4, n k = 2 Parity Check Equations: x 1 + x 3 = 0, x 1 + x 2 + x 4 = 0 Parity Bits: x 3 = x 1,

More information

Finding Consensus Strings With Small Length Difference Between Input and Solution Strings

Finding Consensus Strings With Small Length Difference Between Input and Solution Strings Finding Consensus Strings With Small Length Difference Between Input and Solution Strings Markus L. Schmid Trier University, Fachbereich IV Abteilung Informatikwissenschaften, D-54286 Trier, Germany, MSchmid@uni-trier.de

More information

Lecture 9: Submatrices and Some Special Matrices

Lecture 9: Submatrices and Some Special Matrices Lecture 9: Submatrices and Some Special Matrices Winfried Just Department of Mathematics, Ohio University February 2, 2018 Submatrices A submatrix B of a given matrix A is any matrix that can be obtained

More information

COVARIANCE IDENTITIES AND MIXING OF RANDOM TRANSFORMATIONS ON THE WIENER SPACE

COVARIANCE IDENTITIES AND MIXING OF RANDOM TRANSFORMATIONS ON THE WIENER SPACE Communications on Stochastic Analysis Vol. 4, No. 3 (21) 299-39 Serials Publications www.serialspublications.com COVARIANCE IDENTITIES AND MIXING OF RANDOM TRANSFORMATIONS ON THE WIENER SPACE NICOLAS PRIVAULT

More information

CHAPTER 6. Direct Methods for Solving Linear Systems

CHAPTER 6. Direct Methods for Solving Linear Systems CHAPTER 6 Direct Methods for Solving Linear Systems. Introduction A direct method for approximating the solution of a system of n linear equations in n unknowns is one that gives the exact solution to

More information

Approximability of Dense Instances of Nearest Codeword Problem

Approximability of Dense Instances of Nearest Codeword Problem Approximability of Dense Instances of Nearest Codeword Problem Cristina Bazgan 1, W. Fernandez de la Vega 2, Marek Karpinski 3 1 Université Paris Dauphine, LAMSADE, 75016 Paris, France, bazgan@lamsade.dauphine.fr

More information

1 Column Generation and the Cutting Stock Problem

1 Column Generation and the Cutting Stock Problem 1 Column Generation and the Cutting Stock Problem In the linear programming approach to the traveling salesman problem we used the cutting plane approach. The cutting plane approach is appropriate when

More information

RECAP How to find a maximum matching?

RECAP How to find a maximum matching? RECAP How to find a maximum matching? First characterize maximum matchings A maximal matching cannot be enlarged by adding another edge. A maximum matching of G is one of maximum size. Example. Maximum

More information

Linear and Integer Programming - ideas

Linear and Integer Programming - ideas Linear and Integer Programming - ideas Paweł Zieliński Institute of Mathematics and Computer Science, Wrocław University of Technology, Poland http://www.im.pwr.wroc.pl/ pziel/ Toulouse, France 2012 Literature

More information

The University of Texas at Austin Department of Electrical and Computer Engineering. EE381V: Large Scale Learning Spring 2013.

The University of Texas at Austin Department of Electrical and Computer Engineering. EE381V: Large Scale Learning Spring 2013. The University of Texas at Austin Department of Electrical and Computer Engineering EE381V: Large Scale Learning Spring 2013 Assignment 1 Caramanis/Sanghavi Due: Thursday, Feb. 7, 2013. (Problems 1 and

More information

On The Computational Complexity Of Bi-Clustering

On The Computational Complexity Of Bi-Clustering On The Computational Complexity Of Bi-Clustering by Sharon Wulff A thesis presented to the University of Waterloo in fulfillment of the thesis requirement for the degree of Master of Mathematics in Computer

More information

Linear Classification: Linear Programming

Linear Classification: Linear Programming Linear Classification: Linear Programming Yufei Tao Department of Computer Science and Engineering Chinese University of Hong Kong 1 / 21 Y Tao Linear Classification: Linear Programming Recall the definition

More information

Lecture 4 Scheduling 1

Lecture 4 Scheduling 1 Lecture 4 Scheduling 1 Single machine models: Number of Tardy Jobs -1- Problem 1 U j : Structure of an optimal schedule: set S 1 of jobs meeting their due dates set S 2 of jobs being late jobs of S 1 are

More information

Consensus Optimizing Both Distance Sum and Radius

Consensus Optimizing Both Distance Sum and Radius Consensus Optimizing Both Distance Sum and Radius Amihood Amir 1, Gad M. Landau 2, Joong Chae Na 3, Heejin Park 4, Kunsoo Park 5, and Jeong Seop Sim 6 1 Bar-Ilan University, 52900 Ramat-Gan, Israel 2 University

More information

CSE 431/531: Analysis of Algorithms. Dynamic Programming. Lecturer: Shi Li. Department of Computer Science and Engineering University at Buffalo

CSE 431/531: Analysis of Algorithms. Dynamic Programming. Lecturer: Shi Li. Department of Computer Science and Engineering University at Buffalo CSE 431/531: Analysis of Algorithms Dynamic Programming Lecturer: Shi Li Department of Computer Science and Engineering University at Buffalo Paradigms for Designing Algorithms Greedy algorithm Make a

More information

National University of Singapore Department of Electrical & Computer Engineering. Examination for

National University of Singapore Department of Electrical & Computer Engineering. Examination for National University of Singapore Department of Electrical & Computer Engineering Examination for EE5139R Information Theory for Communication Systems (Semester I, 2014/15) November/December 2014 Time Allowed:

More information

Source Coding. Master Universitario en Ingeniería de Telecomunicación. I. Santamaría Universidad de Cantabria

Source Coding. Master Universitario en Ingeniería de Telecomunicación. I. Santamaría Universidad de Cantabria Source Coding Master Universitario en Ingeniería de Telecomunicación I. Santamaría Universidad de Cantabria Contents Introduction Asymptotic Equipartition Property Optimal Codes (Huffman Coding) Universal

More information

Algorithms for Data Science: Lecture on Finding Similar Items

Algorithms for Data Science: Lecture on Finding Similar Items Algorithms for Data Science: Lecture on Finding Similar Items Barna Saha 1 Finding Similar Items Finding similar items is a fundamental data mining task. We may want to find whether two documents are similar

More information

Sequential pairing of mixed integer inequalities

Sequential pairing of mixed integer inequalities Sequential pairing of mixed integer inequalities Yongpei Guan, Shabbir Ahmed, George L. Nemhauser School of Industrial & Systems Engineering, Georgia Institute of Technology, 765 Ferst Drive, Atlanta,

More information

Q(x n+1,..., x m ) in n independent variables

Q(x n+1,..., x m ) in n independent variables Cluster algebras of geometric type (Fomin / Zelevinsky) m n positive integers ambient field F of rational functions over Q(x n+1,..., x m ) in n independent variables Definition: A seed in F is a pair

More information

CS 6820 Fall 2014 Lectures, October 3-20, 2014

CS 6820 Fall 2014 Lectures, October 3-20, 2014 Analysis of Algorithms Linear Programming Notes CS 6820 Fall 2014 Lectures, October 3-20, 2014 1 Linear programming The linear programming (LP) problem is the following optimization problem. We are given

More information

A Branch-and-Cut Algorithm for the Stochastic Uncapacitated Lot-Sizing Problem

A Branch-and-Cut Algorithm for the Stochastic Uncapacitated Lot-Sizing Problem Yongpei Guan 1 Shabbir Ahmed 1 George L. Nemhauser 1 Andrew J. Miller 2 A Branch-and-Cut Algorithm for the Stochastic Uncapacitated Lot-Sizing Problem December 12, 2004 Abstract. This paper addresses a

More information

Bipartite Perfect Matching

Bipartite Perfect Matching Bipartite Perfect Matching We are given a bipartite graph G = (U, V, E). U = {u 1, u 2,..., u n }. V = {v 1, v 2,..., v n }. E U V. We are asked if there is a perfect matching. A permutation π of {1, 2,...,

More information

CONSTRUCTION OF SLICED ORTHOGONAL LATIN HYPERCUBE DESIGNS

CONSTRUCTION OF SLICED ORTHOGONAL LATIN HYPERCUBE DESIGNS Statistica Sinica 23 (2013), 1117-1130 doi:http://dx.doi.org/10.5705/ss.2012.037 CONSTRUCTION OF SLICED ORTHOGONAL LATIN HYPERCUBE DESIGNS Jian-Feng Yang, C. Devon Lin, Peter Z. G. Qian and Dennis K. J.

More information

Solving the Order-Preserving Submatrix Problem via Integer Programming

Solving the Order-Preserving Submatrix Problem via Integer Programming Solving the Order-Preserving Submatrix Problem via Integer Programming Andrew C. Trapp, Oleg A. Prokopyev Department of Industrial Engineering, University of Pittsburgh, Pittsburgh, Pennsylvania 15261,

More information

Faster Algorithms for some Optimization Problems on Collinear Points

Faster Algorithms for some Optimization Problems on Collinear Points Faster Algorithms for some Optimization Problems on Collinear Points Ahmad Biniaz Prosenjit Bose Paz Carmi Anil Maheshwari J. Ian Munro Michiel Smid June 29, 2018 Abstract We propose faster algorithms

More information

Lecture 13 October 6, Covering Numbers and Maurey s Empirical Method

Lecture 13 October 6, Covering Numbers and Maurey s Empirical Method CS 395T: Sublinear Algorithms Fall 2016 Prof. Eric Price Lecture 13 October 6, 2016 Scribe: Kiyeon Jeon and Loc Hoang 1 Overview In the last lecture we covered the lower bound for p th moment (p > 2) and

More information

CS Homework Chapter 6 ( 6.14 )

CS Homework Chapter 6 ( 6.14 ) CS50 - Homework Chapter 6 ( 6. Dan Li, Xiaohui Kong, Hammad Ibqal and Ihsan A. Qazi Department of Computer Science, University of Pittsburgh, Pittsburgh, PA 560 Intelligent Systems Program, University

More information

Approximations for MAX-SAT Problem

Approximations for MAX-SAT Problem Approximations for MAX-SAT Problem Chihao Zhang BASICS, Shanghai Jiao Tong University Oct. 23, 2012 Chihao Zhang (BASICS@SJTU) Approximations for MAX-SAT Problem Oct. 23, 2012 1 / 16 The Weighted MAX-SAT

More information

Concentration function and other stuff

Concentration function and other stuff Concentration function and other stuff Sabrina Sixta Tuesday, June 16, 2014 Sabrina Sixta () Concentration function and other stuff Tuesday, June 16, 2014 1 / 13 Table of Contents Outline 1 Chernoff Bound

More information

Approximations for MAX-SAT Problem

Approximations for MAX-SAT Problem Approximations for MAX-SAT Problem Chihao Zhang BASICS, Shanghai Jiao Tong University Oct. 23, 2012 Chihao Zhang (BASICS@SJTU) Approximations for MAX-SAT Problem Oct. 23, 2012 1 / 16 The Weighted MAX-SAT

More information

CHAPTER 10 Shape Preserving Properties of B-splines

CHAPTER 10 Shape Preserving Properties of B-splines CHAPTER 10 Shape Preserving Properties of B-splines In earlier chapters we have seen a number of examples of the close relationship between a spline function and its B-spline coefficients This is especially

More information

5 Integer Linear Programming (ILP) E. Amaldi Foundations of Operations Research Politecnico di Milano 1

5 Integer Linear Programming (ILP) E. Amaldi Foundations of Operations Research Politecnico di Milano 1 5 Integer Linear Programming (ILP) E. Amaldi Foundations of Operations Research Politecnico di Milano 1 Definition: An Integer Linear Programming problem is an optimization problem of the form (ILP) min

More information

CSCI1950 Z Computa4onal Methods for Biology Lecture 4. Ben Raphael February 2, hhp://cs.brown.edu/courses/csci1950 z/ Algorithm Summary

CSCI1950 Z Computa4onal Methods for Biology Lecture 4. Ben Raphael February 2, hhp://cs.brown.edu/courses/csci1950 z/ Algorithm Summary CSCI1950 Z Computa4onal Methods for Biology Lecture 4 Ben Raphael February 2, 2009 hhp://cs.brown.edu/courses/csci1950 z/ Algorithm Summary Parsimony Probabilis4c Method Input Output Sankoff s & Fitch

More information

Conference Program Design with Single-Peaked and Single-Crossing Preferences

Conference Program Design with Single-Peaked and Single-Crossing Preferences Conference Program Design with Single-Peaked and Single-Crossing Preferences Dimitris Fotakis 1, Laurent Gourvès 2, and Jérôme Monnot 2 1 School of Electrical and Computer Engineering, National Technical

More information

Applied Mathematics Letters

Applied Mathematics Letters Applied Mathematics Letters (009) 15 130 Contents lists available at ScienceDirect Applied Mathematics Letters journal homepage: www.elsevier.com/locate/aml Spectral characterizations of sandglass graphs

More information