Evaluating Data Minability Through Compression An Experimental Study

Size: px
Start display at page:

Download "Evaluating Data Minability Through Compression An Experimental Study"

Transcription

1 Evaluatig Data Miability Through Compressio A Experimetal Study Da Simovici Uiv. of Massachusetts Bosto, Bosto, USA, dsim at cs.umb.edu Da Pletea Uiv. of Massachusetts Bosto, Bosto, USA, dpletea at cs.umb.edu Saaid Baraty Uiv. of Massachusetts Bosto, Bosto, USA, sbaraty at cs.umb.edu Abstract The effectiveess of compressio algorithms is icreasig as the data subjected to compressio cotais repetitive patters. This basic idea is used to detect the existece of regularities i various types of data ragig from market basket data to udirected graphs. The results are quite idepedet of the particular algorithms used for compressio ad offer a idicatio of the potetial of discoverig patters i data before the actual miig process takes place. Keywords-data miig; lossless compressio; LZW; market basket data; patters; Kroecker product. I. INTRODUCTION Our goal is to show that compressio ca be used as a tool to evaluate the potetial of a data set of producig iterestig results i a data miig process. The basic idea that data that displays repetitive patters or patters that occur with a certai regularity will be compressed more efficietly compared to data that has o such characteristics. Thus, a pre-processig phase of the miig process should allow to decide whether a data set is worth miig, or compare the iterestigess of applyig miig algorithms to several data sets. Sice compressio is geerally iexpesive ad compressio methods are well-studied ad uderstood, pre-miig usig compressio will help data miig aalysts to focus their efforts o miig resources that ca provide a highest payout without a exorbitat cost. Compressio has received lots of attetio i the data miig literature. As observed by Maila [7], data compressio ca be regarded as oe of the fudametal approaches to data miig [7], sice the goal of the data miig is to compress data by fidig some structure i it. The role of compressio developig parameter-free data miig algorithms i aomaly detectio, classificatio ad clusterig was examied i [4]. The size C(x) of a compressed file x is as a approximatio of Kolmogorov complexity [2] ad allows the defiitio of a pseudo-distace betwee two files x ad y as d(x, y) = C(xy) C(x) + C(y). Further advaces i this directio were developed i [8][5][6]. A Kolmogorov complexity-based dissimilarity was successfully used to texture matchig problems i [1] which have a broad spectrum of applicatios i areas like bioiformatics, atural laguages. ad music. We illustrate the use of lossless compressio i pre-miig data by focusig o several distict data miig processes: files with frequet patters, frequet itemsets i market basket data, ad explorig similarity of graphs. The LZW (Lempel-Ziv-Welch) algorithm was itroduced i 1984 by T. Welch i [9] ad is amog the most popular compressio techiques. The algorithm does ot eed to check all the data before startig the compressio ad the performace is based o the umber of the repetitios ad the legths of the strigs ad the ratio of 0s/1s or true/false at the bit level. There are several versios of the LZW algorithm. Popular programs (such as Wizip) use variatios of the LZW compressio. The Wizip/Zip type of algorithms also work at the bit level ad ot at a charater/byte level. We explore three experimetal settigs that provide strog empirical evidece of the correlatio betwee compressio ratio ad the existece of hidde patters i data. I Sectio II, we compress biary strigs that cotai patters; i Sectio III, we study the compressibility of adjacecy matrix for graphs relative to the etropy of distributio of subgraphs. Fially, i Sectio IV, we examie the compressibility of files that cotai market basket data sets. II. PATTERNS IN STRINGS AND COMPRESSION Let A be the set of strigs o the alphabet A. The legth of a strig w is deoted by w. The ull strig o A is deoted by λ ad we defie A + as A + = A {λ}. If w A ca be writte as w = utv, where u, v A ad t A +, we say that the pair (t, m) is a occurrece of t i w, where m is the legth of u. The occurreces (x, m) ad (y, p) are overlappig if p < m + x. If this is the case, there is a proper suffix of x that equals a proper prefix of y. If t is a word such that the sets of its proper prefixes ad its proper suffixes are disjoit, there are o overlappig occurreces of x i ay word. The umber of occurreces of a strig t i a strig w is deoted by t (w). Clearly, we have { a (w) a A} = w. The prevalece of t i w is the umber f t (w) = t(w) t w which gives the ratio of the characters cotaied i the occurreces of t relative to the total umber of characters i the strig.

2 The result of applyig a compressio algorithm C to a strig w A is deoted by C(w) ad the compressio ratio is the umber CR C (w) = C(w). w I this sectio, we shall use the biary alphabet B = {0, 1} ad the LZW algorithm or the compressio algorithm of the package java.util.zip. We geerated radom strigs of bits (0s ad 1s) ad computed the compressio ratio strigs with a variety of symbol distributios. A strig w that cotais oly 0s (or oly 1s) achieves a very good compressio ratio of CR jzip (w) = for 100K bits ad CR jzip = for 500K bits, where jzip deotes the compressio algorithm from the package java.util.zip. Figure 1 shows, as expected, that the worst compressio ratio is achieved whe 0s ad 1s occur with equal frequecies. Table I PATTERN 001 PREVALENCE VERSUS THE CR jzip Prevalece of CR jzip Baselie 001 patter 0% % % % % % % % % % % % Figure 2. Variatio of compressio rate depeds o the prevalece of the patter 001 Figure 1. Baselie CR jzip Behavior For strigs of small legth (less tha 10 4 bits) the compressio ratio may exceed 1 because of the overhead itroduced by the algorithm. However, whe the size of the radom strig exceeds 10 6 bits this pheomeo disappears ad the compressio ratio depeds oly o the prevalece of the bits ad is relatively idepedet o the size of the file. Thus, i Figure 1, the curves that correspod to files of size 10 6 ad overlap. We refer to the compressio ratio of a radom strig w with a ( 0 (w), 1 (w)) distributio as the baselie compressio ratio. We created a series of biary strigs ϕ t,m which have a miimum guarateed umber m of occurreces of patters t {0, 1} k, where 0 m 100. Specifically, we created 101 files ϕ 001,m for the patter 001, each cotaiig 100K bits ad we geerated similar series for t {01, 0010, 00010}. The compressio ratio is show i Figure 2. The compressio ratio starts at a value of 0.94 ad after the prevalece of the patter becomes more frequet tha 20% the compressio ratio drops dramatically. Results of the experimet are show i Table I ad i Figure 3. Figure 3. Depedecy of Compressio Ratio o Patter Prevalece We coclude that the presece of repeated patters i strigs leads to a high degree of compressio (that is, to low compressio ratios). Thus, a low compressio ratio for a file idicates that the miig process may produce iterestig results. III. RANDOM INSERTION AND COMPRESSION For a matrix M {0, 1} u v deote by i (M) the umber of etries of M that equal i, where i {0, 1}. Clearly, we have 0 (B)+ 1 (B) = uv. For a radom variable V which

3 rages over the set of matrices {0, 1} u v let ν i (V ) be the radom variable whose values equal the umber of etries of V that equal i, where i {0, 1}. Let A {0, 1} p q be a 0/1 matrix ad let ( ) B1 B B : 2 B k, p 1 p 2 p k be a matrix-valued radom variable where B j R r s, p j 0 for 1 j k, ad k j=1 p j = 1. Defiitio 3.1: The radom variable A B obtaied by the isertio of B ito A is give by a 11 B... a 1 B A B =..... R mr s a m1 B... a m B I other words, the etries of A B are obtaied by substitutig the block a ij B l with the probability p l for a ij i A. Note that this operatio is a probabilistic geeralizatio of Kroecker s product for if ( ) B1 B :, 1 the A B has as its uique value the Kroecker product A B. The expected umber of 1s i the isertio A B is k E[ν 1 (A B)] = 1 (A) 1 (B j )p j j=1 Whe 1 (B 1 ) = = 1 (B k ) =, we have E[ν 1 (A B)] = 1 (A). I the experimet that ivolves isertio, we used a matrix-valued radom variable such that 1 (B 1 ) = = 1 (B k ) =. Thus, the variability of the values of A B is caused by the variability of the cotets of the matrices B 1,..., B k which ca be evaluated usig the etropy of the distributio of B, k H(B) = p j log 2 p j. j=1 We expect to obtai a strog positive correlatio betwee the etropy of B ad the degree of compressio achieved o the file that represets the matrix A B, ad the experimets support this expectatio. I a first series of compressios, we worked with a matrix A {0, 1} ad with a matrix-valued radom variable ( ) B1 B B : 2 B 3, p 1 p 2 p 3 where B j {0, 1} 3 3, ad 1 (B 1 ) = 1 (B 2 ) = 1 (B 3 ) = 4. Several probability distributios were cosidered, as show i Table II. Values of A B had = etries. I Table II, we had 39% 1s ad the baselie compressio rate for a biary file with this ratio of 1s is We also computed the correlatio betwee the CR jzip ad the Shao etropy of the probability distributio ad obtaied the value for 3 matrices. I Table III, we did the same experimet but with 4 differet matrices of 4 4. A strog correlatio (0.992.) was observed betwee CR jzip ad the Shao etropy of the probability distributio. Table II MATRIX INSERTIONS, ENTROPY AND COMPRESSION RATIOS Probability distributio CR jzip Shao Etropy (0, 1, 0) (1, 0, 0) (0, 0, 1) (0.2, 0.2, 0.6) (0.6, 0.2, 0.2) (0.33, 0.33, 0.34) (0, 0.3, 0.7) (0.9, 0.1, 0) (0.8, 0, 0.2) (0.49, 0.25, 0.26) (0.15, 0.35, 0.5) Table III KRONECKER PRODUCT AND PROBABILITY DISTRIBUTION FOR 4 MATRICES Probability distributio CR jzip Shao Etropy (0, 1,0, 0) (0, 1,0, 0) (0.2, 0.2, 0.2, 0.4) (0.25, 0.25, 0.25, 0.25) (0.4, 0, 0.2, 0.4) (0.3, 0.1, 0.2, 0.4) (0.45, 0.12, 0.22, 0.21) Figure 4. Evolutio CR jzip ad Shao Etropy of Probability Distributio. I Figure 4, we have the evolutio of CR jzip o the y axis ad o the x axis the Shao Etropy of the probability distributio for both experimets. We ca see clearly the liear correlatio betwee the two.

4 This experimet proves us agai that i case of repetitios/patters the CR jzip is better tha i the case of radomly geerated files. Next, we examie the compressibility of biary square matrices ad its relatioship with the distributio of pricipal submatrices. A biary square matrix is compressed by first vectorizig the matrix ad the compressig the biary sequece. The issue is relevat i graph theory, where the pricipal submatrices of the adjacecy matrix of a graph correspod to the adjacecy matrices of the subgraphs of that graph. The patters i a graph are captured i the form of frequet isomorphic subgraphs. There is a strog correlatio betwee the compressio ratio of the adjacecy matrix of a graph ad the frequecies of the occurreces of isomorphic subgraphs of it. Specifically, the lower the compressio ratio is, the higher are the frequecies of isomorphic subgraphs ad hece the worthier is the graph for beig mied. Let G be a udirected graph havig {v 1,..., v } as its set of odes. The adjacecy matrix of G, A G {0, 1} is defied as { 1 if there is a edge betwee v i ad v j i G (A G ) ij = 0 otherwise. We deote with CR C (A G ) the compressio ratio of the adjacecy matrix of graph G obtaied by applyig the compressio algorithm C. Defie the pricipal subcompoet of matrix A G with respect to the set of idices S = {s 1,..., s k } {1, 2,..., } to be the k k matrix A G (S) such that 1 if there is a edge betwee v si ad v sj A G (S) ij = i G 0 otherwise. The matrix A G (S) is the adjacecy matrix of the subgraph of G which cosists of the odes with idices i S alog with those edges that coect these odes. We deote by P (k) the collectio of all subsets of {1, 2,..., } of size k where 2 k. We have P (k) = ( k). Let (M k 1,..., M k l k ) be a eumeratio of possible adjacecy matrices of graphs with k odes where l k = 2 k(k 1) 2. We defie the fiite probability distributio ( ) k 1 P(G, k) = (G ) P (k),..., k l k (G ), P (k) where k i (G ) for 1 i l k is the umber of subgraphs of G with adjacecy matrix M k i. The Shao etropy of this probability distributio is: H P (G, k) = l k i=1 k i (G ) P (k) log k i (G ) 2 P (k). If H P (G, k) is low, there are to be fewer ad larger sets of isomorphic subgraphs of G of size k. I other words, small values of H P (G, k) for various values of k suggest that the graph G cotais repeated patters ad is susceptible to produce iterestig results. Note that although two isomorphic subgraphs do ot ecessarily have the same adjacecy matrix, the umber H P (G, k) is a good idicator of the frequecy of isomorphic subgraphs ad hece subgraph patters. We evaluated the correlatio betwee CR jzip (A G ) ad H P (G, k) for differet values of k. As expected, the compressio ratio of the adjacecy matrix ad the distributio etropy of graphs are roughly the same for isomorphic graphs, so both umbers are characteristic for a isomorphism type. If is a permutatio of the vertices of G, the adjacecy matrix of the graph G obtaied by applyig the permutatio is defied by A G is give by A G = P A G P 1. We compute this adjacecy matrix of A G, the etropy H P (G, k) the compressio ratio CR jzip (A G ) for several values of k ad permutatios. We radomly geerated graphs with = 60 odes ad various umber of edges ragig from 5 to For each geerated graph, we radomly produced twety permutatios of its set of odes ad computed H P (G, k) ad CR jzip (A G ). Fially, for each graph we calculated the ratio of stadard deviatio over average for the computed compressio ratios, followed by the same computatio for distributio etropies. The results of this experimet are show i Figures 5 ad 6 agaist the umber of edges. As it ca be see, the deviatio over mea of the compressio ratios for = 60 does ot exceed the umber Also, the deviatio over average of the distributio etropies for various values of k do ot exceed I particular, the deviatio of the distributio etropy for the graphs of 100 to 1500 edges falls below 0.001, which allows us to coclude that the deviatios of both compressio ratio ad distributio etropy with respect to isomorphisms are egligible. For each k {3, 4, 5}, we geerated radomly 560 graphs havig 60 vertices ad sets of edges whose size were varyig from 10 to The, the umbers H P (G, k) ad CR jzip (A G ) were computed. Figure 7 captures the results of the experimet. Each plot cotais two curves. The first curve represets the chages i average CR jzip (A G ) for forty radomly geerated graphs of equal umber of edges. The secod curve represets the variatio of the average H P (G, k) for the same forty graphs. The treds of these two curves are very similar for differet values of k. Table IV cotais the correlatio betwee CR jzip (A G ) ad H P (G, k) calculated for the 560 radomly geerated graphs for each value of k.

5 Figure 5. Stadard deviatio vs. average of the CR jzip (A G ) for a umber of differet permutatios of odes for the same graph. The horizotal axis is labelled with the umber of edges of the graph. = 60 ad k = 3 Figure 6. Stadard deviatio vs. average of the H P (G, k) of a umber of differet permutatios of odes for the same graph. The horizotal axis is labelled with the umber of edges of the graph. Each curve correspods to oe value of k. IV. FREQUENT ITEMS SETS AND COMPRESSION RATIO = 60 ad k = 4 A market basket data set cosists of a multiset T of trasactios. Each trasactio t is a subset of a set of items I = {i 1,...,i N }. A trasactio is described by its characteristic N-tuple t = (t 1,..., t N ), where { 1 if i k t. t k = 0 otherwise, for 1 k N. The legth of a trasactio t is t = N k=0 t k, while the average size of trasactios is { t t i T } T. The support of a set of items K of the data set T is {t T K t the umber supp(k) = T. The set of items K is s-frequet if supp(k) > s. The study of market basket data sets is cocered with the idetificatio of associatio rules. A pair of item sets (X, Y ) is a associatio rule. Its support, supp(x Y ) equals supp(x) ad its cofidece cof(x Y ) is defied as cof(x Y ) = supp(xy ) supp(x). = 60 ad k = 5 Figure 7. Plots of average CR jzip (A G ) (CMP RTIO) ad average H P (G, k) (DIST ENT) for radomly geerated graphs G of equal umber of edges with respect to the umber of edges.

6 Table IV CORRELATIONS BETWEEN CR jzip (A G ) AND H P (G, k) k Correlatio Usig the artificial trasactio ARMier geerator described i [3], we created a basket data set. Trasactios are represeted by sequeces of bits (t 1,, t N ). The multiset of M trasactios was represeted as a biary strig of legth M N obtaied by cocateatig the strigs that represet trasactios. We geerated files with 1000 trasactios, with 100 items available i the basket, addig up to 100K bits. For data sets havig the same umber of items ad trasactios, the efficiecy of the compressio icreases whe the umber of patters is lower (causig more repetitios). I a experimet with a average size of a frequet item set equal to 10, the average size of a trasactio equal to 15, ad the umber of frequet item sets varyig i the set {5, 10, 20, 30, 50, 75, 100, 200, 500, 1000}, the compressio ratio had a sigificat variatio ragig betwee 0.20 ad 0.75, as show i Table V. The correlatio betwee the umber of patters ad CR was Although the frequecy of 1s ad baselie compressio ratio were roughly costat (at 0.75), the umber of patters ad compressio ratio were correlated. Table V NUMBER OF ASSOCIATION RULES AT 0.05 SUPPORT LEVEL AND 0.9 CONFIDENCE Number of Patters Frequecy of 1s Baselie Compressio Number of assoc. compressio ratio rules 5 16% ,128, % ,539, % ,233, % , % ,910, % , % , % % % ivestigatios show that idetifyig compressible areas of huma DNA is a useful tool for detectig areas where the gee replicatio mechaisms are disturbed (a pheomeo that occurs i certai geetically based diseases. REFERENCES [1] B. J. L. Campaa ad E. J. Keogh. A compressio based distace measure for texture. I SDM, pages , [2] R. Cilibrasi ad P. M. B. Vitayi. Clusterig by compressio. IEEE Trasactios o Iformatio Theory, 51: , [3] L. Cristofor. ARMier project, [4] E. Keogh, S. Loardi, ad C. A. Rataamahataa. Towards parameter-free data miig. I Proc. 10th ACM SIGKDD Itl Cof. Kowledge Discovery ad Data Miig, pages ACM Press, [5] E. Keogh, S. Loardi, C. A. Rataamahataa, L. Wei, S. Lee, ad J. Hadley. Compressio-based data miig of sequetial data. Data Miig ad Kowledge Discovery, 14:99 129, [6] E. J. Keogh, L. Keogh, ad J. Hadley. Compressiobased data miig. I Ecyclopedia of Data Warehousig ad Miig, pages [7] H. Maila. Theoretical frameworks for data miig. SIGKDD Exploratio, 1:30 32, [8] L. Wei, J. Hadley, N. Marti, T. Su, ad E. J. Keogh. Clusterig workflow requiremets usig compressio dissimilarity measure. I ICDM Workshops, pages 50 54, [9] T. Welch. A techique for high performace data compressio. IEEE Computer, 17:8 19, Further, there was a strog egative correlatio (-0.92) betwee the compressio ratio ad the umber of associatio rules idicatig that market basket data sets that satisfy may associatio rules are very compressible V. CONCLUDING REMARKS Compressio ratio of a file ca be computed fast ad easy, ad i may cases offers a cheap way of predictig the existece of embedded patters i data. Thus, it becomes possible to obtai a approximative estimatio of the usefuless of a i-depth exploratio of a data set usig more sophisticated ad expesive algorithms. The use of compressio as a measure of miability is illustrated o a variety of paradigms: graph data, market basket data, etc. Recet

Evaluating Data Minability Through Compression An Experimental Study

Evaluating Data Minability Through Compression An Experimental Study Evaluatig Data Miability Through Compressio A Experimetal Study Da Simovici Uiv. of Massachusetts Bosto, Bosto, USA, dsim at cs.umb.edu Da Pletea Uiv. of Massachusetts Bosto, Bosto, USA, dpletea at cs.umb.edu

More information

Information Theory and Statistics Lecture 4: Lempel-Ziv code

Information Theory and Statistics Lecture 4: Lempel-Ziv code Iformatio Theory ad Statistics Lecture 4: Lempel-Ziv code Łukasz Dębowski ldebowsk@ipipa.waw.pl Ph. D. Programme 203/204 Etropy rate is the limitig compressio rate Theorem For a statioary process (X i)

More information

UC Berkeley CS 170: Efficient Algorithms and Intractable Problems Handout 17 Lecturer: David Wagner April 3, Notes 17 for CS 170

UC Berkeley CS 170: Efficient Algorithms and Intractable Problems Handout 17 Lecturer: David Wagner April 3, Notes 17 for CS 170 UC Berkeley CS 170: Efficiet Algorithms ad Itractable Problems Hadout 17 Lecturer: David Wager April 3, 2003 Notes 17 for CS 170 1 The Lempel-Ziv algorithm There is a sese i which the Huffma codig was

More information

Lecture 14: Graph Entropy

Lecture 14: Graph Entropy 15-859: Iformatio Theory ad Applicatios i TCS Sprig 2013 Lecture 14: Graph Etropy March 19, 2013 Lecturer: Mahdi Cheraghchi Scribe: Euiwoog Lee 1 Recap Bergma s boud o the permaet Shearer s Lemma Number

More information

Basics of Probability Theory (for Theory of Computation courses)

Basics of Probability Theory (for Theory of Computation courses) Basics of Probability Theory (for Theory of Computatio courses) Oded Goldreich Departmet of Computer Sciece Weizma Istitute of Sciece Rehovot, Israel. oded.goldreich@weizma.ac.il November 24, 2008 Preface.

More information

1 Inferential Methods for Correlation and Regression Analysis

1 Inferential Methods for Correlation and Regression Analysis 1 Iferetial Methods for Correlatio ad Regressio Aalysis I the chapter o Correlatio ad Regressio Aalysis tools for describig bivariate cotiuous data were itroduced. The sample Pearso Correlatio Coefficiet

More information

( ) = p and P( i = b) = q.

( ) = p and P( i = b) = q. MATH 540 Radom Walks Part 1 A radom walk X is special stochastic process that measures the height (or value) of a particle that radomly moves upward or dowward certai fixed amouts o each uit icremet of

More information

The Boolean Ring of Intervals

The Boolean Ring of Intervals MATH 532 Lebesgue Measure Dr. Neal, WKU We ow shall apply the results obtaied about outer measure to the legth measure o the real lie. Throughout, our space X will be the set of real umbers R. Whe ecessary,

More information

CS284A: Representations and Algorithms in Molecular Biology

CS284A: Representations and Algorithms in Molecular Biology CS284A: Represetatios ad Algorithms i Molecular Biology Scribe Notes o Lectures 3 & 4: Motif Discovery via Eumeratio & Motif Represetatio Usig Positio Weight Matrix Joshua Gervi Based o presetatios by

More information

Topic 18: Composite Hypotheses

Topic 18: Composite Hypotheses Toc 18: November, 211 Simple hypotheses limit us to a decisio betwee oe of two possible states of ature. This limitatio does ot allow us, uder the procedures of hypothesis testig to address the basic questio:

More information

Properties and Hypothesis Testing

Properties and Hypothesis Testing Chapter 3 Properties ad Hypothesis Testig 3.1 Types of data The regressio techiques developed i previous chapters ca be applied to three differet kids of data. 1. Cross-sectioal data. 2. Time series data.

More information

Probability, Expectation Value and Uncertainty

Probability, Expectation Value and Uncertainty Chapter 1 Probability, Expectatio Value ad Ucertaity We have see that the physically observable properties of a quatum system are represeted by Hermitea operators (also referred to as observables ) such

More information

CEE 522 Autumn Uncertainty Concepts for Geotechnical Engineering

CEE 522 Autumn Uncertainty Concepts for Geotechnical Engineering CEE 5 Autum 005 Ucertaity Cocepts for Geotechical Egieerig Basic Termiology Set A set is a collectio of (mutually exclusive) objects or evets. The sample space is the (collectively exhaustive) collectio

More information

Optimally Sparse SVMs

Optimally Sparse SVMs A. Proof of Lemma 3. We here prove a lower boud o the umber of support vectors to achieve geeralizatio bouds of the form which we cosider. Importatly, this result holds ot oly for liear classifiers, but

More information

Axioms of Measure Theory

Axioms of Measure Theory MATH 532 Axioms of Measure Theory Dr. Neal, WKU I. The Space Throughout the course, we shall let X deote a geeric o-empty set. I geeral, we shall ot assume that ay algebraic structure exists o X so that

More information

Binary classification, Part 1

Binary classification, Part 1 Biary classificatio, Part 1 Maxim Ragisky September 25, 2014 The problem of biary classificatio ca be stated as follows. We have a radom couple Z = (X,Y ), where X R d is called the feature vector ad Y

More information

Infinite Sequences and Series

Infinite Sequences and Series Chapter 6 Ifiite Sequeces ad Series 6.1 Ifiite Sequeces 6.1.1 Elemetary Cocepts Simply speakig, a sequece is a ordered list of umbers writte: {a 1, a 2, a 3,...a, a +1,...} where the elemets a i represet

More information

ECE 8527: Introduction to Machine Learning and Pattern Recognition Midterm # 1. Vaishali Amin Fall, 2015

ECE 8527: Introduction to Machine Learning and Pattern Recognition Midterm # 1. Vaishali Amin Fall, 2015 ECE 8527: Itroductio to Machie Learig ad Patter Recogitio Midterm # 1 Vaishali Ami Fall, 2015 tue39624@temple.edu Problem No. 1: Cosider a two-class discrete distributio problem: ω 1 :{[0,0], [2,0], [2,2],

More information

Expectation and Variance of a random variable

Expectation and Variance of a random variable Chapter 11 Expectatio ad Variace of a radom variable The aim of this lecture is to defie ad itroduce mathematical Expectatio ad variace of a fuctio of discrete & cotiuous radom variables ad the distributio

More information

Lecture 2: April 3, 2013

Lecture 2: April 3, 2013 TTIC/CMSC 350 Mathematical Toolkit Sprig 203 Madhur Tulsiai Lecture 2: April 3, 203 Scribe: Shubhedu Trivedi Coi tosses cotiued We retur to the coi tossig example from the last lecture agai: Example. Give,

More information

This is an introductory course in Analysis of Variance and Design of Experiments.

This is an introductory course in Analysis of Variance and Design of Experiments. 1 Notes for M 384E, Wedesday, Jauary 21, 2009 (Please ote: I will ot pass out hard-copy class otes i future classes. If there are writte class otes, they will be posted o the web by the ight before class

More information

Linear Regression Demystified

Linear Regression Demystified Liear Regressio Demystified Liear regressio is a importat subject i statistics. I elemetary statistics courses, formulae related to liear regressio are ofte stated without derivatio. This ote iteds to

More information

Lecture 4 February 16, 2016

Lecture 4 February 16, 2016 MIT 6.854/18.415: Advaced Algorithms Sprig 16 Prof. Akur Moitra Lecture 4 February 16, 16 Scribe: Be Eysebach, Devi Neal 1 Last Time Cosistet Hashig - hash fuctios that evolve well Radom Trees - routig

More information

1 of 7 7/16/2009 6:06 AM Virtual Laboratories > 6. Radom Samples > 1 2 3 4 5 6 7 6. Order Statistics Defiitios Suppose agai that we have a basic radom experimet, ad that X is a real-valued radom variable

More information

Math 4707 Spring 2018 (Darij Grinberg): homework set 4 page 1

Math 4707 Spring 2018 (Darij Grinberg): homework set 4 page 1 Math 4707 Sprig 2018 Darij Griberg): homewor set 4 page 1 Math 4707 Sprig 2018 Darij Griberg): homewor set 4 due date: Wedesday 11 April 2018 at the begiig of class, or before that by email or moodle Please

More information

Information-based Feature Selection

Information-based Feature Selection Iformatio-based Feature Selectio Farza Faria, Abbas Kazeroui, Afshi Babveyh Email: {faria,abbask,afshib}@staford.edu 1 Itroductio Feature selectio is a topic of great iterest i applicatios dealig with

More information

On Random Line Segments in the Unit Square

On Random Line Segments in the Unit Square O Radom Lie Segmets i the Uit Square Thomas A. Courtade Departmet of Electrical Egieerig Uiversity of Califoria Los Ageles, Califoria 90095 Email: tacourta@ee.ucla.edu I. INTRODUCTION Let Q = [0, 1] [0,

More information

On the Linear Complexity of Feedback Registers

On the Linear Complexity of Feedback Registers O the Liear Complexity of Feedback Registers A. H. Cha M. Goresky A. Klapper Northeaster Uiversity Abstract I this paper, we study sequeces geerated by arbitrary feedback registers (ot ecessarily feedback

More information

4.1 Sigma Notation and Riemann Sums

4.1 Sigma Notation and Riemann Sums 0 the itegral. Sigma Notatio ad Riema Sums Oe strategy for calculatig the area of a regio is to cut the regio ito simple shapes, calculate the area of each simple shape, ad the add these smaller areas

More information

Lecture 9: Expanders Part 2, Extractors

Lecture 9: Expanders Part 2, Extractors Lecture 9: Expaders Part, Extractors Topics i Complexity Theory ad Pseudoradomess Sprig 013 Rutgers Uiversity Swastik Kopparty Scribes: Jaso Perry, Joh Kim I this lecture, we will discuss further the pseudoradomess

More information

Lecture 2. The Lovász Local Lemma

Lecture 2. The Lovász Local Lemma Staford Uiversity Sprig 208 Math 233A: No-costructive methods i combiatorics Istructor: Ja Vodrák Lecture date: Jauary 0, 208 Origial scribe: Apoorva Khare Lecture 2. The Lovász Local Lemma 2. Itroductio

More information

Discrete Mathematics and Probability Theory Summer 2014 James Cook Note 15

Discrete Mathematics and Probability Theory Summer 2014 James Cook Note 15 CS 70 Discrete Mathematics ad Probability Theory Summer 2014 James Cook Note 15 Some Importat Distributios I this ote we will itroduce three importat probability distributios that are widely used to model

More information

4 The Sperner property.

4 The Sperner property. 4 The Sperer property. I this sectio we cosider a surprisig applicatio of certai adjacecy matrices to some problems i extremal set theory. A importat role will also be played by fiite groups. I geeral,

More information

Lecture 4: April 10, 2013

Lecture 4: April 10, 2013 TTIC/CMSC 1150 Mathematical Toolkit Sprig 01 Madhur Tulsiai Lecture 4: April 10, 01 Scribe: Haris Agelidakis 1 Chebyshev s Iequality recap I the previous lecture, we used Chebyshev s iequality to get a

More information

Confidence interval for the two-parameter exponentiated Gumbel distribution based on record values

Confidence interval for the two-parameter exponentiated Gumbel distribution based on record values Iteratioal Joural of Applied Operatioal Research Vol. 4 No. 1 pp. 61-68 Witer 2014 Joural homepage: www.ijorlu.ir Cofidece iterval for the two-parameter expoetiated Gumbel distributio based o record values

More information

Binary codes from graphs on triples and permutation decoding

Binary codes from graphs on triples and permutation decoding Biary codes from graphs o triples ad permutatio decodig J. D. Key Departmet of Mathematical Scieces Clemso Uiversity Clemso SC 29634 U.S.A. J. Moori ad B. G. Rodrigues School of Mathematics Statistics

More information

Discrete Mathematics and Probability Theory Spring 2012 Alistair Sinclair Note 15

Discrete Mathematics and Probability Theory Spring 2012 Alistair Sinclair Note 15 CS 70 Discrete Mathematics ad Probability Theory Sprig 2012 Alistair Siclair Note 15 Some Importat Distributios The first importat distributio we leared about i the last Lecture Note is the biomial distributio

More information

EECS564 Estimation, Filtering, and Detection Hwk 2 Solns. Winter p θ (z) = (2θz + 1 θ), 0 z 1

EECS564 Estimation, Filtering, and Detection Hwk 2 Solns. Winter p θ (z) = (2θz + 1 θ), 0 z 1 EECS564 Estimatio, Filterig, ad Detectio Hwk 2 Sols. Witer 25 4. Let Z be a sigle observatio havig desity fuctio where. p (z) = (2z + ), z (a) Assumig that is a oradom parameter, fid ad plot the maximum

More information

Lecture 11: Pseudorandom functions

Lecture 11: Pseudorandom functions COM S 6830 Cryptography Oct 1, 2009 Istructor: Rafael Pass 1 Recap Lecture 11: Pseudoradom fuctios Scribe: Stefao Ermo Defiitio 1 (Ge, Ec, Dec) is a sigle message secure ecryptio scheme if for all uppt

More information

Sensitivity Analysis of Daubechies 4 Wavelet Coefficients for Reduction of Reconstructed Image Error

Sensitivity Analysis of Daubechies 4 Wavelet Coefficients for Reduction of Reconstructed Image Error Proceedigs of the 6th WSEAS Iteratioal Coferece o SIGNAL PROCESSING, Dallas, Texas, USA, March -4, 7 67 Sesitivity Aalysis of Daubechies 4 Wavelet Coefficiets for Reductio of Recostructed Image Error DEVINDER

More information

Analysis of Algorithms. Introduction. Contents

Analysis of Algorithms. Introduction. Contents Itroductio The focus of this module is mathematical aspects of algorithms. Our mai focus is aalysis of algorithms, which meas evaluatig efficiecy of algorithms by aalytical ad mathematical methods. We

More information

An Introduction to Randomized Algorithms

An Introduction to Randomized Algorithms A Itroductio to Radomized Algorithms The focus of this lecture is to study a radomized algorithm for quick sort, aalyze it usig probabilistic recurrece relatios, ad also provide more geeral tools for aalysis

More information

Exercises Advanced Data Mining: Solutions

Exercises Advanced Data Mining: Solutions Exercises Advaced Data Miig: Solutios Exercise 1 Cosider the followig directed idepedece graph. 5 8 9 a) Give the factorizatio of P (X 1, X 2,..., X 9 ) correspodig to this idepedece graph. P (X) = 9 P

More information

1 Statement of the Game

1 Statement of the Game ANALYSIS OF THE CHOW-ROBBINS GAME JON LU May 10, 2016 Abstract Flip a coi repeatedly ad stop wheever you wat. Your payoff is the proportio of heads ad you wish to maximize this payoff i expectatio. I this

More information

Chapter 22. Comparing Two Proportions. Copyright 2010, 2007, 2004 Pearson Education, Inc.

Chapter 22. Comparing Two Proportions. Copyright 2010, 2007, 2004 Pearson Education, Inc. Chapter 22 Comparig Two Proportios Copyright 2010, 2007, 2004 Pearso Educatio, Ic. Comparig Two Proportios Read the first two paragraphs of pg 504. Comparisos betwee two percetages are much more commo

More information

Random Variables, Sampling and Estimation

Random Variables, Sampling and Estimation Chapter 1 Radom Variables, Samplig ad Estimatio 1.1 Itroductio This chapter will cover the most importat basic statistical theory you eed i order to uderstad the ecoometric material that will be comig

More information

Skip lists: A randomized dictionary

Skip lists: A randomized dictionary Discrete Math for Bioiformatics WS 11/12:, by A. Bocmayr/K. Reiert, 31. Otober 2011, 09:53 3001 Sip lists: A radomized dictioary The expositio is based o the followig sources, which are all recommeded

More information

Overview. p 2. Chapter 9. Pooled Estimate of. q = 1 p. Notation for Two Proportions. Inferences about Two Proportions. Assumptions

Overview. p 2. Chapter 9. Pooled Estimate of. q = 1 p. Notation for Two Proportions. Inferences about Two Proportions. Assumptions Chapter 9 Slide Ifereces from Two Samples 9- Overview 9- Ifereces about Two Proportios 9- Ifereces about Two Meas: Idepedet Samples 9-4 Ifereces about Matched Pairs 9-5 Comparig Variatio i Two Samples

More information

6.895 Essential Coding Theory October 20, Lecture 11. This lecture is focused in comparisons of the following properties/parameters of a code:

6.895 Essential Coding Theory October 20, Lecture 11. This lecture is focused in comparisons of the following properties/parameters of a code: 6.895 Essetial Codig Theory October 0, 004 Lecture 11 Lecturer: Madhu Suda Scribe: Aastasios Sidiropoulos 1 Overview This lecture is focused i comparisos of the followig properties/parameters of a code:

More information

On a Smarandache problem concerning the prime gaps

On a Smarandache problem concerning the prime gaps O a Smaradache problem cocerig the prime gaps Felice Russo Via A. Ifate 7 6705 Avezzao (Aq) Italy felice.russo@katamail.com Abstract I this paper, a problem posed i [] by Smaradache cocerig the prime gaps

More information

Distribution of Random Samples & Limit theorems

Distribution of Random Samples & Limit theorems STAT/MATH 395 A - PROBABILITY II UW Witer Quarter 2017 Néhémy Lim Distributio of Radom Samples & Limit theorems 1 Distributio of i.i.d. Samples Motivatig example. Assume that the goal of a study is to

More information

Convergence of random variables. (telegram style notes) P.J.C. Spreij

Convergence of random variables. (telegram style notes) P.J.C. Spreij Covergece of radom variables (telegram style otes).j.c. Spreij this versio: September 6, 2005 Itroductio As we kow, radom variables are by defiitio measurable fuctios o some uderlyig measurable space

More information

Shannon s noiseless coding theorem

Shannon s noiseless coding theorem 18.310 lecture otes May 4, 2015 Shao s oiseless codig theorem Lecturer: Michel Goemas I these otes we discuss Shao s oiseless codig theorem, which is oe of the foudig results of the field of iformatio

More information

OPTIMAL ALGORITHMS -- SUPPLEMENTAL NOTES

OPTIMAL ALGORITHMS -- SUPPLEMENTAL NOTES OPTIMAL ALGORITHMS -- SUPPLEMENTAL NOTES Peter M. Maurer Why Hashig is θ(). As i biary search, hashig assumes that keys are stored i a array which is idexed by a iteger. However, hashig attempts to bypass

More information

CS322: Network Analysis. Problem Set 2 - Fall 2009

CS322: Network Analysis. Problem Set 2 - Fall 2009 Due October 9 009 i class CS3: Network Aalysis Problem Set - Fall 009 If you have ay questios regardig the problems set, sed a email to the course assistats: simlac@staford.edu ad peleato@staford.edu.

More information

11 Correlation and Regression

11 Correlation and Regression 11 Correlatio Regressio 11.1 Multivariate Data Ofte we look at data where several variables are recorded for the same idividuals or samplig uits. For example, at a coastal weather statio, we might record

More information

Fortgeschrittene Datenstrukturen Vorlesung 11

Fortgeschrittene Datenstrukturen Vorlesung 11 Fortgeschrittee Datestruture Vorlesug 11 Schriftführer: Marti Weider 19.01.2012 1 Succict Data Structures (ctd.) 1.1 Select-Queries A slightly differet approach, compared to ra, is used for select. B represets

More information

Chapter 3. Strong convergence. 3.1 Definition of almost sure convergence

Chapter 3. Strong convergence. 3.1 Definition of almost sure convergence Chapter 3 Strog covergece As poited out i the Chapter 2, there are multiple ways to defie the otio of covergece of a sequece of radom variables. That chapter defied covergece i probability, covergece i

More information

Classification of problem & problem solving strategies. classification of time complexities (linear, logarithmic etc)

Classification of problem & problem solving strategies. classification of time complexities (linear, logarithmic etc) Classificatio of problem & problem solvig strategies classificatio of time complexities (liear, arithmic etc) Problem subdivisio Divide ad Coquer strategy. Asymptotic otatios, lower boud ad upper boud:

More information

# fixed points of g. Tree to string. Repeatedly select the leaf with the smallest label, write down the label of its neighbour and remove the leaf.

# fixed points of g. Tree to string. Repeatedly select the leaf with the smallest label, write down the label of its neighbour and remove the leaf. Combiatorics Graph Theory Coutig labelled ad ulabelled graphs There are 2 ( 2) labelled graphs of order. The ulabelled graphs of order correspod to orbits of the actio of S o the set of labelled graphs.

More information

The method of types. PhD short course Information Theory and Statistics Siena, September, Mauro Barni University of Siena

The method of types. PhD short course Information Theory and Statistics Siena, September, Mauro Barni University of Siena PhD short course Iformatio Theory ad Statistics Siea, 15-19 September, 2014 The method of types Mauro Bari Uiversity of Siea Outlie of the course Part 1: Iformatio theory i a utshell Part 2: The method

More information

Entropies & Information Theory

Entropies & Information Theory Etropies & Iformatio Theory LECTURE I Nilajaa Datta Uiversity of Cambridge,U.K. For more details: see lecture otes (Lecture 1- Lecture 5) o http://www.qi.damtp.cam.ac.uk/ode/223 Quatum Iformatio Theory

More information

4.1 Data processing inequality

4.1 Data processing inequality ECE598: Iformatio-theoretic methods i high-dimesioal statistics Sprig 206 Lecture 4: Total variatio/iequalities betwee f-divergeces Lecturer: Yihog Wu Scribe: Matthew Tsao, Feb 8, 206 [Ed. Mar 22] Recall

More information

4.3 Growth Rates of Solutions to Recurrences

4.3 Growth Rates of Solutions to Recurrences 4.3. GROWTH RATES OF SOLUTIONS TO RECURRENCES 81 4.3 Growth Rates of Solutios to Recurreces 4.3.1 Divide ad Coquer Algorithms Oe of the most basic ad powerful algorithmic techiques is divide ad coquer.

More information

Statistical inference: example 1. Inferential Statistics

Statistical inference: example 1. Inferential Statistics Statistical iferece: example 1 Iferetial Statistics POPULATION SAMPLE A clothig store chai regularly buys from a supplier large quatities of a certai piece of clothig. Each item ca be classified either

More information

t distribution [34] : used to test a mean against an hypothesized value (H 0 : µ = µ 0 ) or the difference

t distribution [34] : used to test a mean against an hypothesized value (H 0 : µ = µ 0 ) or the difference EXST30 Backgroud material Page From the textbook The Statistical Sleuth Mea [0]: I your text the word mea deotes a populatio mea (µ) while the work average deotes a sample average ( ). Variace [0]: The

More information

Privacy of Dependent Users Against Statistical Matching

Privacy of Dependent Users Against Statistical Matching Privacy of Depedet Users Agaist Statistical Matchig Nazai Takbiri Studet Member, IEEE, Amir Houmasadr Member, IEEE, Deis Goeckel Fellow, IEEE, Hossei Pishro-Nik Member, IEEE, Abstract Moder applicatios

More information

Entropy and Ergodic Theory Lecture 5: Joint typicality and conditional AEP

Entropy and Ergodic Theory Lecture 5: Joint typicality and conditional AEP Etropy ad Ergodic Theory Lecture 5: Joit typicality ad coditioal AEP 1 Notatio: from RVs back to distributios Let (Ω, F, P) be a probability space, ad let X ad Y be A- ad B-valued discrete RVs, respectively.

More information

Math 155 (Lecture 3)

Math 155 (Lecture 3) Math 55 (Lecture 3) September 8, I this lecture, we ll cosider the aswer to oe of the most basic coutig problems i combiatorics Questio How may ways are there to choose a -elemet subset of the set {,,,

More information

Machine Learning Theory Tübingen University, WS 2016/2017 Lecture 11

Machine Learning Theory Tübingen University, WS 2016/2017 Lecture 11 Machie Learig Theory Tübige Uiversity, WS 06/07 Lecture Tolstikhi Ilya Abstract We will itroduce the otio of reproducig kerels ad associated Reproducig Kerel Hilbert Spaces (RKHS). We will cosider couple

More information

Lecture 12: November 13, 2018

Lecture 12: November 13, 2018 Mathematical Toolkit Autum 2018 Lecturer: Madhur Tulsiai Lecture 12: November 13, 2018 1 Radomized polyomial idetity testig We will use our kowledge of coditioal probability to prove the followig lemma,

More information

Chapter 6: Mining Frequent Patterns, Association and Correlations

Chapter 6: Mining Frequent Patterns, Association and Correlations Chapter 6: Miig Frequet Patters, Associatio ad Correlatios Basic cocepts Frequet itemset miig methods Costrait-based frequet patter miig (ch7) Associatio rules 1 What Is Frequet Patter Aalysis? Frequet

More information

Lecture 10: Universal coding and prediction

Lecture 10: Universal coding and prediction 0-704: Iformatio Processig ad Learig Sprig 0 Lecture 0: Uiversal codig ad predictio Lecturer: Aarti Sigh Scribes: Georg M. Goerg Disclaimer: These otes have ot bee subjected to the usual scrutiy reserved

More information

FACULTY OF MATHEMATICAL STUDIES MATHEMATICS FOR PART I ENGINEERING. Lectures

FACULTY OF MATHEMATICAL STUDIES MATHEMATICS FOR PART I ENGINEERING. Lectures FACULTY OF MATHEMATICAL STUDIES MATHEMATICS FOR PART I ENGINEERING Lectures MODULE 5 STATISTICS II. Mea ad stadard error of sample data. Biomial distributio. Normal distributio 4. Samplig 5. Cofidece itervals

More information

HOMEWORK 2 SOLUTIONS

HOMEWORK 2 SOLUTIONS HOMEWORK SOLUTIONS CSE 55 RANDOMIZED AND APPROXIMATION ALGORITHMS 1. Questio 1. a) The larger the value of k is, the smaller the expected umber of days util we get all the coupos we eed. I fact if = k

More information

Lecture 16: Monotone Formula Lower Bounds via Graph Entropy. 2 Monotone Formula Lower Bounds via Graph Entropy

Lecture 16: Monotone Formula Lower Bounds via Graph Entropy. 2 Monotone Formula Lower Bounds via Graph Entropy 15-859: Iformatio Theory ad Applicatios i TCS CMU: Sprig 2013 Lecture 16: Mootoe Formula Lower Bouds via Graph Etropy March 26, 2013 Lecturer: Mahdi Cheraghchi Scribe: Shashak Sigh 1 Recap Graph Etropy:

More information

Rank Modulation with Multiplicity

Rank Modulation with Multiplicity Rak Modulatio with Multiplicity Axiao (Adrew) Jiag Computer Sciece ad Eg. Dept. Texas A&M Uiversity College Statio, TX 778 ajiag@cse.tamu.edu Abstract Rak modulatio is a scheme that uses the relative order

More information

Math 61CM - Solutions to homework 3

Math 61CM - Solutions to homework 3 Math 6CM - Solutios to homework 3 Cédric De Groote October 2 th, 208 Problem : Let F be a field, m 0 a fixed oegative iteger ad let V = {a 0 + a x + + a m x m a 0,, a m F} be the vector space cosistig

More information

1036: Probability & Statistics

1036: Probability & Statistics 036: Probability & Statistics Lecture 0 Oe- ad Two-Sample Tests of Hypotheses 0- Statistical Hypotheses Decisio based o experimetal evidece whether Coffee drikig icreases the risk of cacer i humas. A perso

More information

General IxJ Contingency Tables

General IxJ Contingency Tables page1 Geeral x Cotigecy Tables We ow geeralize our previous results from the prospective, retrospective ad cross-sectioal studies ad the Poisso samplig case to x cotigecy tables. For such tables, the test

More information

MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.436J/15.085J Fall 2008 Lecture 19 11/17/2008 LAWS OF LARGE NUMBERS II THE STRONG LAW OF LARGE NUMBERS

MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.436J/15.085J Fall 2008 Lecture 19 11/17/2008 LAWS OF LARGE NUMBERS II THE STRONG LAW OF LARGE NUMBERS MASSACHUSTTS INSTITUT OF TCHNOLOGY 6.436J/5.085J Fall 2008 Lecture 9 /7/2008 LAWS OF LARG NUMBRS II Cotets. The strog law of large umbers 2. The Cheroff boud TH STRONG LAW OF LARG NUMBRS While the weak

More information

(6) Fundamental Sampling Distribution and Data Discription

(6) Fundamental Sampling Distribution and Data Discription 34 Stat Lecture Notes (6) Fudametal Samplig Distributio ad Data Discriptio ( Book*: Chapter 8,pg5) Probability& Statistics for Egieers & Scietists By Walpole, Myers, Myers, Ye 8.1 Radom Samplig: Populatio:

More information

The Method of Least Squares. To understand least squares fitting of data.

The Method of Least Squares. To understand least squares fitting of data. The Method of Least Squares KEY WORDS Curve fittig, least square GOAL To uderstad least squares fittig of data To uderstad the least squares solutio of icosistet systems of liear equatios 1 Motivatio Curve

More information

Lecture 3 The Lebesgue Integral

Lecture 3 The Lebesgue Integral Lecture 3: The Lebesgue Itegral 1 of 14 Course: Theory of Probability I Term: Fall 2013 Istructor: Gorda Zitkovic Lecture 3 The Lebesgue Itegral The costructio of the itegral Uless expressly specified

More information

Section 5.1 The Basics of Counting

Section 5.1 The Basics of Counting 1 Sectio 5.1 The Basics of Coutig Combiatorics, the study of arragemets of objects, is a importat part of discrete mathematics. I this chapter, we will lear basic techiques of coutig which has a lot of

More information

AN INTRODUCTION TO SPECTRAL GRAPH THEORY

AN INTRODUCTION TO SPECTRAL GRAPH THEORY AN INTRODUCTION TO SPECTRAL GRAPH THEORY JIAQI JIANG Abstract. Spectral graph theory is the study of properties of the Laplacia matrix or adjacecy matrix associated with a graph. I this paper, we focus

More information

ÉCOLE POLYTECHNIQUE FÉDÉRALE DE LAUSANNE

ÉCOLE POLYTECHNIQUE FÉDÉRALE DE LAUSANNE ÉCOLE POLYTECHNIQUE FÉDÉRALE DE LAUSANNE School of Computer ad Commuicatio Scieces Hadout Iformatio Theory ad Sigal Processig Compressio ad Quatizatio November 0, 207 Data compressio Notatio Give a set

More information

7. Modern Techniques. Data Encryption Standard (DES)

7. Modern Techniques. Data Encryption Standard (DES) 7. Moder Techiques. Data Ecryptio Stadard (DES) The objective of this chapter is to illustrate the priciples of moder covetioal ecryptio. For this purpose, we focus o the most widely used covetioal ecryptio

More information

Data Analysis and Statistical Methods Statistics 651

Data Analysis and Statistical Methods Statistics 651 Data Aalysis ad Statistical Methods Statistics 651 http://www.stat.tamu.edu/~suhasii/teachig.html Suhasii Subba Rao Review of testig: Example The admistrator of a ursig home wats to do a time ad motio

More information

There is no straightforward approach for choosing the warmup period l.

There is no straightforward approach for choosing the warmup period l. B. Maddah INDE 504 Discrete-Evet Simulatio Output Aalysis () Statistical Aalysis for Steady-State Parameters I a otermiatig simulatio, the iterest is i estimatig the log ru steady state measures of performace.

More information

CHAPTER 5. Theory and Solution Using Matrix Techniques

CHAPTER 5. Theory and Solution Using Matrix Techniques A SERIES OF CLASS NOTES FOR 2005-2006 TO INTRODUCE LINEAR AND NONLINEAR PROBLEMS TO ENGINEERS, SCIENTISTS, AND APPLIED MATHEMATICIANS DE CLASS NOTES 3 A COLLECTION OF HANDOUTS ON SYSTEMS OF ORDINARY DIFFERENTIAL

More information

4. Partial Sums and the Central Limit Theorem

4. Partial Sums and the Central Limit Theorem 1 of 10 7/16/2009 6:05 AM Virtual Laboratories > 6. Radom Samples > 1 2 3 4 5 6 7 4. Partial Sums ad the Cetral Limit Theorem The cetral limit theorem ad the law of large umbers are the two fudametal theorems

More information

Response Variable denoted by y it is the variable that is to be predicted measure of the outcome of an experiment also called the dependent variable

Response Variable denoted by y it is the variable that is to be predicted measure of the outcome of an experiment also called the dependent variable Statistics Chapter 4 Correlatio ad Regressio If we have two (or more) variables we are usually iterested i the relatioship betwee the variables. Associatio betwee Variables Two variables are associated

More information

Summary: CORRELATION & LINEAR REGRESSION. GC. Students are advised to refer to lecture notes for the GC operations to obtain scatter diagram.

Summary: CORRELATION & LINEAR REGRESSION. GC. Students are advised to refer to lecture notes for the GC operations to obtain scatter diagram. Key Cocepts: 1) Sketchig of scatter diagram The scatter diagram of bivariate (i.e. cotaiig two variables) data ca be easily obtaied usig GC. Studets are advised to refer to lecture otes for the GC operatios

More information

Randomized Algorithms I, Spring 2018, Department of Computer Science, University of Helsinki Homework 1: Solutions (Discussed January 25, 2018)

Randomized Algorithms I, Spring 2018, Department of Computer Science, University of Helsinki Homework 1: Solutions (Discussed January 25, 2018) Radomized Algorithms I, Sprig 08, Departmet of Computer Sciece, Uiversity of Helsiki Homework : Solutios Discussed Jauary 5, 08). Exercise.: Cosider the followig balls-ad-bi game. We start with oe black

More information

Some Explicit Formulae of NAF and its Left-to-Right. Analogue Based on Booth Encoding

Some Explicit Formulae of NAF and its Left-to-Right. Analogue Based on Booth Encoding Vol.7, No.6 (01, pp.69-74 http://dx.doi.org/10.1457/ijsia.01.7.6.7 Some Explicit Formulae of NAF ad its Left-to-Right Aalogue Based o Booth Ecodig Dog-Guk Ha, Okyeo Yi, ad Tsuyoshi Takagi Kookmi Uiversity,

More information

SEQUENCES AND SERIES

SEQUENCES AND SERIES 9 SEQUENCES AND SERIES INTRODUCTION Sequeces have may importat applicatios i several spheres of huma activities Whe a collectio of objects is arraged i a defiite order such that it has a idetified first

More information

Vector Quantization: a Limiting Case of EM

Vector Quantization: a Limiting Case of EM . Itroductio & defiitios Assume that you are give a data set X = { x j }, j { 2,,, }, of d -dimesioal vectors. The vector quatizatio (VQ) problem requires that we fid a set of prototype vectors Z = { z

More information

ENGI 4421 Confidence Intervals (Two Samples) Page 12-01

ENGI 4421 Confidence Intervals (Two Samples) Page 12-01 ENGI 44 Cofidece Itervals (Two Samples) Page -0 Two Sample Cofidece Iterval for a Differece i Populatio Meas [Navidi sectios 5.4-5.7; Devore chapter 9] From the cetral limit theorem, we kow that, for sufficietly

More information

Context-free grammars and. Basics of string generation methods

Context-free grammars and. Basics of string generation methods Cotext-free grammars ad laguages Basics of strig geeratio methods What s so great about regular expressios? A regular expressio is a strig represetatio of a regular laguage This allows the storig a whole

More information