A Monte Carlo Method to Data Stream Analysis

Size: px
Start display at page:

Download "A Monte Carlo Method to Data Stream Analysis"

Transcription

1 TRANSACTIONS ON ENGINEERING, COMPUTING AND TECHNOLOGY VOLUME 14 AUGUST 06 ISSN A Mote Carlo Metho to Data Stream Aalysis Kittisak Kerprasop, Nittaya Kerprasop, a Pairote Sattayatham Abstract Data stream aalysis is the process of computig various summaries a erive values from large amouts of ata which are cotiuously geerate at a rapi rate. The ature of a stream oes ot allow a revisit o each ata elemet. Furthermore, ata processig must be fast to prouce timely aalysis results. These requiremets impose costraits o the esig of the algorithms to balace correctess agaist timely resposes. Several techiques have bee propose over the past few years to aress these challeges. These techiques ca be categorize as either ataoriete or task-oriete. The ata-oriete approach aalyzes a subset of ata or a smaller trasforme represetatio, whereas taskoriete scheme solves the problem irectly via approximatio techiques. We propose a hybri approach to tackle the ata stream aalysis problem. The ata stream has bee both statistically trasforme to a smaller size a computatioally approximate its characteristics. We aopt a Mote Carlo metho i the approximatio step. The ata reuctio has bee performe horizotally a vertically through our EMR samplig metho. The propose metho is aalyze by a series of experimets. We apply our algorithm o clusterig a classificatio tasks to evaluate the utility of our approach. Keywors Estimatio. D Data Stream, Mote Carlo, Samplig, Desity I. INTRODUCTION ATA aalysis is the process of computig various summaries a erive values from collecte ata. Data miig ca be viewe as a itelliget ata aalysis aimig at extractig valuable kowlege from large amouts of iformatio store i ata repositories [1], [3]. The techiques use i ata miig have bee aopte from the areas of machie learig a statistics, but scalable to eal with the problem of huge repositories of iformatio. The recet avaces i harware a software have eable the rapi geeratio of cotiuous stream of iformatio such as customer click streams, telephoe recors, retail chai trasactios, This work was supporte by the Thaila Research Fu uer Grat MRG , the Natioal Research Coucil, a Suraaree Uiversity of Techology for the sposorship of Data Egieerig a Kowlege Discovery Research Uit. Kittisak Kerprasop is with the School of Computer Egieerig, a the irector of Data Egieerig a Kowlege Discovery Research Uit Suraaree Uiversity of Techology, Nakho Ratchasima 000, Thaila. ( kerpras@ sut.ac.th). Nittaya Kerprasop is with the School of Computer Egieerig, a the member of Data Egieerig a Kowlege Discovery Research Uit Suraaree Uiversity of Techology, Nakho Ratchasima 000, Thaila. ( ittaya@ sut.ac.th). Pairote Sattayatham is with the School of Mathematics, Suraaree Uiversity of Techology, Nakho Ratchasima 000, Thaila. ( pairote@ sut.ac.th). web page visits, a so o. Miig stream ata that grow at a ulimite rate poses a ew challege to researchers a practitioers i the area of ata miig [1], [9]. Data stream is efie as massive amouts of ata cotiuously geerate at a rapi rate, possibly time-varyig a upreictable [2], [9]. Major characteristics of ata streams are the cotiuously olie arrival of ata elemets, ucotrolle orer of such elemets upo arrival, variable sizes, a a oe-time processig of a elemet before it is iscare or archive ue to the massive size of ata that far excees the storage capacity. The requiremets of timely aalysis a efficiet memory usage costrai most ata stream miig algorithms to sacrifice accuracy of the aalysis results for the fast a feasible processig. Developmet of approximatio algorithms [5], [13] is a irect solutio to the problem of ata stream miig. However, the large volumes of ata cotiuously arrivig i a stream coul evetually make the algorithms iefficiet. A more practical solutio is to apply a ata reuctio techique alog with the approximatio algorithms. Data summarizatio techiques, such as wavelet aalysis [] a histogram [2], have bee propose as syopsis ata structures to provie a summary presetatio of ata. The issue of yamic space allocatio as the uerlyig ata istributio chages over time is a fuametal problem of these approaches. Data stream aalysis by choosig a subset of the icomig stream is aother class of techiques for proucig approximate results. Samplig is a statistical-base techique wiely use to scale up the miig algorithms [7]. Nevertheless, i the cotext of ata stream i which the ata size is ukow, simply applyig a samplig metho caot give reliable approximatio. We, therefore, propose a Mote Carlo metho to raw represetatives from ata stream. Mote Carlo simulatio is a wiely use metho to prouce a goo approximatio to the true value or quatity. Our algorithm has bee esige to prouce ata elemets from which the approximate aalysis is close to the exact oe. We perform cluster a classificatio aalyses o several ata sets to verify the reliability of the metho. The paper is orgaize as follows. Sectio 2 presets the theoretical backgrou of a geeral Mote Carlo metho. Sectio 3 sketches the raft iea of esity estimatio from a sample. Our propose metho that is efficietly applicable to ata stream aalysis is explaie i Sectio 4. Some of the experimetal results from cluster a classificatio aalyses over the reuce ata stream are show i Sectio 5. We coclue i Sectio 6 with a iscussio for future work. ENFORMATIKA V14 06 ISSN WORLD ENFORMATIKA SOCIETY

2 TRANSACTIONS ON ENGINEERING, COMPUTING AND TECHNOLOGY VOLUME 14 AUGUST 06 ISSN II. PRINCIPLES OF MONTE CARLO Mote Carlo metho is a class of stochastic algorithm for simulatig the behavior of physical or mathematical systems [11], [14]. The term stochastic implies that the methos are o-etermiistic i which they are base o the use of raom umbers a probability statistics to ivestigate problem. To uersta the metho of Mote Carlo, it is useful to thik of it as a geeral techique of umerical itegratio. Suppose we ee to evaluate the -imesioal itegral of a fuctio f over the uit iterval f ( x, x,..., x ) x x... x f ( x) x. (1) (0,1) The itegral is a o-raom problem, but the Mote Carlo metho represets the itegral as a approximatio problem by itroucig a raom vector U that is uiformly istribute betwee 0 a 1. Applyig the fuctio f to U, we obtai a raom variable f (U) with expectatio E[ f ( U )] f ( x) ( x) x (2) (0,1) where is the probability esity fuctio of U. Sice the value of o the regio of itegratio is 1, equatio (2) becomes E[ f ( U )] f ( x) x (3) (0,1) Equatios (1) a (3) allow us to represet the itegral probabilistic expressio as follow: as a E[ f ( U )] (4) To estimate, we ee a mechaism for rawig poits U 1, U 2,...,U. Applyig fuctio f to each of these raom poits yiels iepeet a ietically istribute (ii ) raom variables f (U 1 ), f (U 2 ),..., f (U ), each with expectatio a staar eviatio. Averagig the results prouces the Mote Carlo estimator 1 f ( Ui ) (5) i 1 which is a ubiase estimator for with the error - approximately ormally istribute with mea 0 a staar eviatio. The form of the staar error is a importat property of Mote Carlo methos. First, it tells us that if we icrease the umber of our samples by a factor of four, we will half the staar error. Seco, staar error oes ot epe o the imesioality of the itegral. A Mote Carlo estimator base o raws from the omai [0,1] still have the form for all imesios. Most techiques of umerical itegratio such as the trapezoial rule egrae i covergece rate with icreasig imesios. We cosier a Mote Carlo metho to be useful i the omai of ata stream aalysis i which the umber of ata is overwhelmig a the exact ata istributio is ukow. The focus of our stuy is to geerate samples from a stream ata which is a prior step to ata moelig a aalysis. Oce the samples have bee successfully raw, the characteristics of stream ca be estimate. We cocetrate o the samplig problem because it ca provie a satisfactory estimatio which will be prove through experimetatios o cluster a classificatio aalyses. III. SAMPLING METHOD AND DENSITY ESTIMATION Basically the Mote Carlo metho employs ay techique of statistical samplig to approximate solutios to quatitative problems. With Mote Carlo metho, a large system ca be sample i a umber of raom cofiguratios, a that ata ca be use to escribe the system as a whole. The efficiecy of the metho epes largely o the ability to raw samples effectively. For a particular omai of stream ata, we cosier the rejectio samplig metho. Rejectio samplig, or acceptace-rejectio samplig, is a samplig metho first itrouce by Vo Neuma [16]. This metho is use i cases where a target istributio, f(x), is too complicate for us to sample from it irectly. Suppose we have a simpler istributio, g(x), which we ca evaluate a geerate samples from, the the ifficult samplig problem ca be avoie by samplig from g(x) istea. By geeratig a uiform raom variable u from the iterval [0,1], we accept x if the coitio u f(x) / Cg(x) hols; otherwise reject the value of x a repeat the samplig step. Posig the restrictio Cg(x) f(x) for some C >1, we say that Cg evelopes f. The valiatio of this metho is the evelope priciple. Whe simulatig the poit (x, v) where v = u*cg(x), we prouce a uiform simulatio over the subgraph of Cg(x). Acceptig oly poits such that u f(x) / Cg(x) the prouces poits (x, v) uiformly istribute over the subgraph of f(x) a thus, margially, a simulatio from f(x). Rejectio samplig will work best if g is a goo approximatio to f. However, i a high-imesioal problem the value of C ees to be chose very large to esure the requiremet Cg(x) > f(x), for all x. The result is a eormous rejectio rate. The ifficulty of applyig rejectio samplig metho irectly to the problem of ata stream aalysis is that we o ot kow beforeha where the moes of f are locate or how high they are. I other wors, we o ot kow the exact characteristics of the target esity. We thus propose to apply the EM (Expectatio-Maximizatio) techique [6] to approximate the esity f(x). We cosier multi-imesioal stream ata as mixtures of Gaussia, or ormal, probability esity fuctios (pf). Gaussia mixtures [8], [12] are combiatios of Gaussia istributios writte as K g( x) p f ( x ) (6) i 1 i A raom variable x eotes iepeet observatio i K mixture compoets. The p i s are the mixig proportios, 0 < p i <1 for all i = 1,..., K, a p p K = 1. The f(x i ) eotes the esity of a -imesioal Gaussia istributio i ENFORMATIKA V14 06 ISSN WORLD ENFORMATIKA SOCIETY

3 TRANSACTIONS ON ENGINEERING, COMPUTING AND TECHNOLOGY VOLUME 14 AUGUST 06 ISSN with mea vector a covariace matrix, that is = (, ), a the Gaussia pf is give by [4], [15] 1 1 T 1 g ( x) exp{ ( x ) ( x )} (7) (, ) 2 2 (2 ) et( ) By varyig the umber of Gaussias K, the mixig proportios p i, a the parameter i of each Gaussia esity fuctio, Gaussia mixtures ca be use to escribe ay complex pfs (Fig. 1). Fig. 1 oe imesioal Gaussia mixture esities for K = 3 (first row) a K = (seco row). The left colum shows the histogram of Gaussia esity, the right colum gives the correspoig Gaussia mixture pf I stream ata a mixture esity p i f(x i) has bee observe with ukow parameters i a p i. To fi these parameters to optimally fit a mixture moel for a give set of ata, the EM algorithm [6], [12], [15] ca be use. The EM algorithm is a broaly applicable approach to the iterative computatio of maximum likelihoo estimates. For a set of ii samples X = { x 1,..., x N }raw from a ata geeratio moel f ( x ) (, ) i, thus the resultig esity for the samples is N i 1 f ( xi ) L( x). (8) (, ) The likelihoo fuctio L( x) is the likelihoo of the parameters give the ata. I the maximum likelihoo problem, the goal is to fi that maximizes L, that is arg max L( X ). I the Gaussia case, the computatio of the expoetial ca be avoie by maximizig log ( L( x ) ) istea of L( x ). The EM algorithm is a approach to fi the maximum of likelihoo fuctios i icomplete ata problems. Let X be observe ata, Z be uobserve ata, a Y = X Z be full ata set. The probability istributio of Z epes o X a the ukow parameter. Give a iitial parameter (0), The EM algorithm prouces a sequece { (0), (1), (2),... } that coverges to a statioary poit of the likelihoo fuctio. IV. EMR SAMPLING I our particular case of ata stream aalysis, we assume that the observe ata have a ormal istributio. Give a specific umber of moels, the EM algorithm is applie to estimate the mea of each moel. These mea values have bee scale up to prouce a upper bou for the uerlyig partially observe target esity. The iea of the propose metho is illustrate i Fig. 2. The target fuctio is represete as a oe-imesioal 3-Gaussia mixtures (the three soli lies at the bottom of Fig. 2) from which we wat to raw samples. The esity E(x) is estimate with the upper bou requiremet that E(x) > f (x) for all x. E ( x) is the approximatio (show as a thick ash lie i Fig. 2) of the ukow target esity. A broa istace of E a E (e.g., at x = 1) represets a rejectig area, whereas a arrow istace (e.g., at x = 6.5) is a acceptace oe. It shoul be ote that EM requires a pre-specifie umber of K compoets to be icorporate ito the mixture moels. Accorig to our propose metho, a suitable umber shoul be selecte by a user. To cope with multi-imesioal problem, we propose to use a statistical metho pricipal compoet aalysis (PCA) to reuce the complicate problem to a simpler two-imesioal problem. That is, we take ito accout oly the first a seco major compoets of the ata set. The two-imesioal ata are use to trai the EM algorithm to estimate parameters a of the Gaussia mixture moels. The estimate Gaussia pf is a istributio E (as show i Fig. 2). To sample from the estimate esity we scale up this istributio to obtai a approximate E, which is a simpler istributio that we ca evaluate a geerate samples from. The outlie of our EMR samplig algorithm is illustrate i Fig. 3. The subroutie Desity_Estimator to approximate the esity fuctio has bee show i Fig. 4. Fig. 2 EM-base rejectio (EMR) samplig From the estimate esity E a the rough approximate E, we perform rejectio samplig with the ecisio criteria { E( x) /( E ( x))} u, whe u is a uiform variable istribute betwee 0 a 1, a is a imesioality of the ata. The iput from stream ata has bee take oe by oe. The ata item that satisfies the criteria will be iclue i the ENFORMATIKA V14 06 ISSN WORLD ENFORMATIKA SOCIETY

4 TRANSACTIONS ON ENGINEERING, COMPUTING AND TECHNOLOGY VOLUME 14 AUGUST 06 ISSN sample util the specifie sample size is completely fille up. The the sample ata set is a represetative of the whole stream. Ay aalysis methos ca ow be performe o this set. Iput: - a -imesioal ata set D with N poits - a iteger K to specify the umber of moels, a - a sample size SS Output: - a sample set S raw from the mixture moels // Data preprocessig steps // 1. If > 0 the Apply PCA to obtai 1 st a 2 compoets 2. Trasform D to a two-imesioal ata set X // Desity estimatio with EM a gettig a rough pf E ( X ) // 3. Set max_iteratio = max{, *K} 4. (E(X), E ( X ) ) = Desity_Estimator (X, K, max_iteratio) 5. Set cout = 0 6. While cout < SS // Samplig steps // 7. Sample x from E(X) 8. Geerate u from U(0,1) 9. If u E( x) /( E ( x)) the Accept x, a it to S, a icremet cout. Retur S Fig. 3 EMR samplig algorithm Desity_Estimator (X, K, max_iteratio) 1. Iitialize parameter = (, ) for each of K Gaussia moels by ruig K-meas 2. Iitialize the prior probabilities P( mk ) of each moel m to 1/K, k = 1,..., K 3. Repeat 4. Compute the probability ( i) ( i) ( i) ( i) ( i) ( i) P( mk ) p( x k, k ) P( mk x, ) ( i) ( i) ( i) ( i) P( m ) p( x, j j j j 5. Upate meas k, variaces k, a priors P N ( i) ( i) x ( 1) 1 P mk x i k N ( i) ( i) P( m, ) 1 k x (, ) N ( i) ( i) ( i 1) ( i 1) T ( i 1) P( m, ) ( )( ) 1 k x x k x k k N ( i) ( i) P( m, ) 1 k x 1 P( m ) P( m x, ) N ( i 1) ( i 1) ( i) ( i) k k N 1 6. Util the max_iteratio has bee reache or the joit likelihoo of all ata with respect to all the moels is greater tha the lower bouary criterio CL( ) K N L( ) CL( ) P( m x, ) log p( x ) k 1 1 k 7. Retur (, ) i k k for k = 1,..., K, a a rough r r i ( k, k ) from r iteratios, r < Fig. 4 Desity-Estimator algorithm V. EXPERIMENTATIONS A. Evaluatio of Desity Estimator The objective of our iitial experimets is to empirically evaluate the closeess of the estimate esity to the real oe. The closeess is etermie by comparig the Eucliea istace of the estimate mea vector to the origial mea vector, a comparig the estimate covariace matrix to the origial covariace matrix. We use a sythetic ata geerator to prouce two-imesioal Gaussia mixtures. The umber of mixture moels, umber of poits i each moel, origial mea vector a covariace matrix are iput parameters. We vary the umber of moels from 2 to with to 1,000 ata poits i each moel. To properly iitialize the compoet meas for the -parameter learig, we fi the approximate mea poits by ruig max{, *K} iteratios of k-meas algorithm [17]. Compoet elemets a mai iagoal covariace matrix elemets are also iitialize accorigly, a off-iagoal matrix elemets are costraie to zero. Some of our experimetal results o the accuracy of our esity estimator compare with the simple uiform samplig are illustrate i Table 1. The EMR samplig results are compare agaist the uiform samplig which always assumes a sigle Gaussia moel. The efficiecy of the samplig methos is evaluate o the basis of the closeess of the estimate i ( i, i ) to the origial meas a covariace matrices of the geerative moels. The - iffereces a -iffereces are average from K moels. The experimetal results cofirm the applicability of the EMR approach towar the problem of -parameter approximatio. The estimate meas a variaces are very close to the origial parameter values. TABLE I EXPERIMENTAL RESULTS OF EMR SAMPLING FROM VARIOUS MIXING OF GAUSSIAN MODELS Number of Mixture Moels EMR Samplig - ifferece - ifferece Uiform Samplig - - ifferece ifferece ENFORMATIKA V14 06 ISSN WORLD ENFORMATIKA SOCIETY

5 TRANSACTIONS ON ENGINEERING, COMPUTING AND TECHNOLOGY VOLUME 14 AUGUST 06 ISSN B. Cluster a Classificatio Aalyses To verify the utility of the propose metho o the realworl ata we ru the k-meas clusterig algorithm [17] o various sample ata from the UCI repository [ uci.eu/~mlear/ MLRepository.html]. We test our algorithm o four ata sets: Wiscosi iagostic breast cacer (466 ata poits, 2 classes), iabetes (512 ata poits, 2 classes), DNA (00 ata poits, 3 classes), a satellite image (4435 ata poits, 6 classes). I each ata set, we assume that the class labels are correct clusters to be fou by the k-meas algorithm. By assumig prior kowlege about kow clusters, we ca evaluate the error rate of the cluster learig. O evaluatio the efficiecy of the Mote Carlo approach we simulate a ata stream by geeratig several samples for each ata set. I our experimets we observe the performace of cluster learig o icreasig samples varie from 1%, 5%, %, 15%,...,%, a the complete ata set. The experimetal results are show i Fig. 5. The clusterig results reveal the efficiecy of the propose metho that oly arou -% samplig size is sufficiet for the accurate learig of ata clusters. The classificatio task has bee performe o the same experimetal settig with the C4.5 algorithm [17]. the results are show i Fig. 6. VI. CONCLUSION I this paper we propose a techique of Mote Carlo estimatio to aalyze major characteristics of ata stream. At the samplig phase of the Mote Carlo metho we propose the EMR samplig algorithm to efficietly raw represetative samples from ata cotaiig mixture moels. We propose to apply the expectatio-maximizatio techique to estimate the meas a variaces of the mixture moels. The algorithm Desity_Estimator prouces two esity fuctios, E a E. The istace of E a E at each samplig poit is a ecisio criteria for either sample acceptace or rejectio. A arrow istace amog the two estimate esities tes to the acceptace case if the istace ratio is greater tha the geerate uiform raom variable from the iterval [0, 1]. The experimetal results verify the utility of the propose Desity_Estimator algorithm a the EMR samplig metho. The clusterig a classificatio experimetatios o realworl ata also cofirm the efficiecy of our metho. We pla to further our stuy o skewe ata i which the istributios are ot uiformly istribute. [5] G. Coremoe a S. Muthukrisha, What s hot a what s ot: Trackig most frequet items yamically, i Pro. ACM PODS, 03. [6] A. P. Dempster, N. M. Lair, a D. B. Rubi, Maximum likelihoo from icomplete ata via the EM algorithm, Joural of the Royal Statistical Society B, vol. 39, pp. 1-22, [7] P. Domigos a G. Hulte, A geeral metho to scalig up machie learig algorithms a its applicatio to clusterig, i Pro. ICML, 01. [8] M. A. T. Figueireo a A. K. Jai, Usupervise learig of fiite mixture moels, IEEE Tras. Patter Aalysis a Machie Itelligece, vol. 24, pp , 02. [9] M. Gaber, A. Zaslavsky, a S. Krishaswamy, Miig ata stream: A review, SIGMOD Recor, vol. 34, pp , 05. [] A. Gilbert, Y. Kotiis, S. Muthukrisha, a M. Strauss, Oe-pass wavelet ecompositios of ata streams, IEEE Tras. Kowlege a Data Egieerig, vol. 15, pp , 03. [11] D. Mackay, Itrouctio to Mote Carlo, i Learig i Graphical Moels, M. Jora, E. MIT Press, 1996, pp [12] J. M. Mari, K. Megerse, a C. Robert, Bayesia moellig a iferece o mixtures of istributios, i Habook of Statistics, vol., Elsevier-Sciece, 05. [13] S. Muthukrisha, Data streams: Algorithms a applicatios, i Proc. ACM-SIAM Symposium o Discrete Algorithm, 03. [14] R. Neal, Probabilistic iferece usig Markov chai Mote Carlo methos, Dept. Computer Sciece, Uiversity of Toroto, Techical Report CRG-TR93-1, [15] B. Resch, A tutorial for the course computatioal itelligece, Available: [16] J. vo Neuma, Various techiques use i coectio with raom igits, Applie Mathematics Series, vol. 12, Natioal Bureau of Staars, Washigto, D.C., [17] I. Witte a E. Frak, Data Miig: Practical Machie Learig Tools a Techiques with Java Implemetatios. Morga Kaufma, 00. REFERENCES [1] C. Aggarwal, J. Ha, J. Wag, a P. Yu, A framework for clusterig evolvig ata streams, i Pro. Very Large Data Bases, 03. [2] B. Babcock, S. Babu, M. Datar, R. Motwai, a J. Wiom, Moel a issues i ata stream systems, i Pro. ACM PODS, 02. [3] M. Berthol a D.J. Ha, Itelliget Data Aalysis: A Itrouctio. Spriger-Verlag, 03. [4] J. Bilmes, A getle tutorial of the EM algorithm a its applicatio to parameter estimatio for Gaussia mixture a hie Markov moels, Dept. Electrical Egieerig a Computer Sciece, Uiversity of Califoria Berkeley, Techical Report TR , ENFORMATIKA V14 06 ISSN WORLD ENFORMATIKA SOCIETY

6 TRANSACTIONS ON ENGINEERING, COMPUTING AND TECHNOLOGY VOLUME 14 AUGUST 06 ISSN Wiscosi iagostic breast cacer Uiform Samplig EMR Samplig 15 Wiscosi iagostic breast cacer Uiform Samplig Samplig from Estimate Distributio Diabetes Diabetes DNA DNA 0 Satellite image Satellite image Fig. 5 Clusterig results o four ata sets Fig. 6 Classificatio results o four ata sets ENFORMATIKA V14 06 ISSN WORLD ENFORMATIKA SOCIETY

A COMPUTATIONAL STUDY UPON THE BURR 2-DIMENSIONAL DISTRIBUTION

A COMPUTATIONAL STUDY UPON THE BURR 2-DIMENSIONAL DISTRIBUTION TOME VI (year 8), FASCICULE 1, (ISSN 1584 665) A COMPUTATIONAL STUDY UPON THE BURR -DIMENSIONAL DISTRIBUTION MAKSAY Ştefa, BISTRIAN Diaa Alia Uiversity Politehica Timisoara, Faculty of Egieerig Hueoara

More information

Expectation maximization

Expectation maximization Motivatio Expectatio maximizatio Subhrasu Maji CMSCI 689: Machie Learig 14 April 015 Suppose you are builig a aive Bayes spam classifier. After your are oe your boss tells you that there is o moey to label

More information

(average number of points per unit length). Note that Equation (9B1) does not depend on the

(average number of points per unit length). Note that Equation (9B1) does not depend on the EE603 Class Notes 9/25/203 Joh Stesby Appeix 9-B: Raom Poisso Poits As iscusse i Chapter, let (t,t 2 ) eote the umber of Poisso raom poits i the iterval (t, t 2 ]. The quatity (t, t 2 ) is a o-egative-iteger-value

More information

Inhomogeneous Poisson process

Inhomogeneous Poisson process Chapter 22 Ihomogeeous Poisso process We coclue our stuy of Poisso processes with the case of o-statioary rates. Let us cosier a arrival rate, λ(t), that with time, but oe that is still Markovia. That

More information

Chapter 2 Transformations and Expectations

Chapter 2 Transformations and Expectations Chapter Trasformatios a Epectatios Chapter Distributios of Fuctios of a Raom Variable Problem: Let be a raom variable with cf F ( ) If we efie ay fuctio of, say g( ) g( ) is also a raom variable whose

More information

Expectation-Maximization Algorithm.

Expectation-Maximization Algorithm. Expectatio-Maximizatio Algorithm. Petr Pošík Czech Techical Uiversity i Prague Faculty of Electrical Egieerig Dept. of Cyberetics MLE 2 Likelihood.........................................................................................................

More information

Commonly Used Distributions and Parameter Estimation

Commonly Used Distributions and Parameter Estimation Commoly Use Distributios a Parameter stimatio Berli Che Departmet of Computer Sciece & Iformatio gieerig Natioal Taiwa Normal Uiversity Referece:. W. Navii. Statistics for gieerig a Scietists. Chapter

More information

Lecture 2: Monte Carlo Simulation

Lecture 2: Monte Carlo Simulation STAT/Q SCI 43: Itroductio to Resamplig ethods Sprig 27 Istructor: Ye-Chi Che Lecture 2: ote Carlo Simulatio 2 ote Carlo Itegratio Assume we wat to evaluate the followig itegratio: e x3 dx What ca we do?

More information

Monte Carlo Integration

Monte Carlo Integration Mote Carlo Itegratio I these otes we first review basic umerical itegratio methods (usig Riema approximatio ad the trapezoidal rule) ad their limitatios for evaluatig multidimesioal itegrals. Next we itroduce

More information

6.867 Machine learning, lecture 11 (Jaakkola)

6.867 Machine learning, lecture 11 (Jaakkola) 6.867 Machie learig, lecture 11 (Jaakkola) 1 Lecture topics: moel selectio criteria Miimum escriptio legth (MDL) Feature (subset) selectio Moel selectio criteria: Miimum escriptio legth (MDL) The miimum

More information

Lecture 6 Testing Nonlinear Restrictions 1. The previous lectures prepare us for the tests of nonlinear restrictions of the form:

Lecture 6 Testing Nonlinear Restrictions 1. The previous lectures prepare us for the tests of nonlinear restrictions of the form: Eco 75 Lecture 6 Testig Noliear Restrictios The previous lectures prepare us for the tests of oliear restrictios of the form: H 0 : h( 0 ) = 0 versus H : h( 0 ) 6= 0: () I this lecture, we cosier Wal,

More information

The Chi Squared Distribution Page 1

The Chi Squared Distribution Page 1 The Chi Square Distributio Page Cosier the istributio of the square of a score take from N(, The probability that z woul have a value less tha is give by z / g ( ( e z if > F π, if < z where ( e g e z

More information

The Expectation-Maximization (EM) Algorithm

The Expectation-Maximization (EM) Algorithm The Expectatio-Maximizatio (EM) Algorithm Readig Assigmets T. Mitchell, Machie Learig, McGraw-Hill, 997 (sectio 6.2, hard copy). S. Gog et al. Dyamic Visio: From Images to Face Recogitio, Imperial College

More information

Information-based Feature Selection

Information-based Feature Selection Iformatio-based Feature Selectio Farza Faria, Abbas Kazeroui, Afshi Babveyh Email: {faria,abbask,afshib}@staford.edu 1 Itroductio Feature selectio is a topic of great iterest i applicatios dealig with

More information

High dimensional Sobol Sequences and their applications

High dimensional Sobol Sequences and their applications High imesioal Sobol Sequeces a their applicatios S. Kuchereko Email: s.kuchereko@broa.co.uk BRODA Lt. www.broa.co.uk 1. Evaluatio of high imesioal itegrals usig MC a QMC methos There are three mai applicatios

More information

SOME RESULTS RELATED TO DISTRIBUTION FUNCTIONS OF CHI-SQUARE TYPE RANDOM VARIABLES WITH RANDOM DEGREES OF FREEDOM

SOME RESULTS RELATED TO DISTRIBUTION FUNCTIONS OF CHI-SQUARE TYPE RANDOM VARIABLES WITH RANDOM DEGREES OF FREEDOM Bull Korea Math Soc 45 (2008), No 3, pp 509 522 SOME RESULTS RELATED TO DISTRIBUTION FUNCTIONS OF CHI-SQUARE TYPE RANDOM VARIABLES WITH RANDOM DEGREES OF FREEDOM Tra Loc Hug, Tra Thie Thah, a Bui Quag

More information

Mixtures of Gaussians and the EM Algorithm

Mixtures of Gaussians and the EM Algorithm Mixtures of Gaussias ad the EM Algorithm CSE 6363 Machie Learig Vassilis Athitsos Computer Sciece ad Egieerig Departmet Uiversity of Texas at Arligto 1 Gaussias A popular way to estimate probability desity

More information

u t + f(u) x = 0, (12.1) f(u) x dx = 0. u(x, t)dx = f(u(a)) f(u(b)).

u t + f(u) x = 0, (12.1) f(u) x dx = 0. u(x, t)dx = f(u(a)) f(u(b)). 12 Fiite Volume Methos Whe solvig a PDE umerically, how o we eal with iscotiuous iitial ata? The Fiite Volume metho has particular stregth i this area. It is commoly use for hyperbolic PDEs whose solutios

More information

6.3.3 Parameter Estimation

6.3.3 Parameter Estimation 130 CHAPTER 6. ARMA MODELS 6.3.3 Parameter Estimatio I this sectio we will iscuss methos of parameter estimatio for ARMAp,q assumig that the orers p a q are kow. Metho of Momets I this metho we equate

More information

A proposed discrete distribution for the statistical modeling of

A proposed discrete distribution for the statistical modeling of It. Statistical Ist.: Proc. 58th World Statistical Cogress, 0, Dubli (Sessio CPS047) p.5059 A proposed discrete distributio for the statistical modelig of Likert data Kidd, Marti Cetre for Statistical

More information

Representing Functions as Power Series. 3 n ...

Representing Functions as Power Series. 3 n ... Math Fall 7 Lab Represetig Fuctios as Power Series I. Itrouctio I sectio.8 we leare the series c c c c c... () is calle a power series. It is a uctio o whose omai is the set o all or which it coverges.

More information

Mechatronics II Laboratory Exercise 5 Second Order Response

Mechatronics II Laboratory Exercise 5 Second Order Response Mechatroics II Laboratory Exercise 5 Seco Orer Respose Theoretical Backgrou Seco orer ifferetial equatios approximate the yamic respose of may systems. The respose of a geeric seco orer system ca be see

More information

SAMPLING LIPSCHITZ CONTINUOUS DENSITIES. 1. Introduction

SAMPLING LIPSCHITZ CONTINUOUS DENSITIES. 1. Introduction SAMPLING LIPSCHITZ CONTINUOUS DENSITIES OLIVIER BINETTE Abstract. A simple ad efficiet algorithm for geeratig radom variates from the class of Lipschitz cotiuous desities is described. A MatLab implemetatio

More information

Goodness-Of-Fit For The Generalized Exponential Distribution. Abstract

Goodness-Of-Fit For The Generalized Exponential Distribution. Abstract Goodess-Of-Fit For The Geeralized Expoetial Distributio By Amal S. Hassa stitute of Statistical Studies & Research Cairo Uiversity Abstract Recetly a ew distributio called geeralized expoetial or expoetiated

More information

Course Outline. Problem Identification. Engineering as Design. Amme 3500 : System Dynamics and Control. System Response. Dr. Stefan B.

Course Outline. Problem Identification. Engineering as Design. Amme 3500 : System Dynamics and Control. System Response. Dr. Stefan B. Course Outlie Amme 35 : System Dyamics a Cotrol System Respose Week Date Cotet Assigmet Notes Mar Itrouctio 8 Mar Frequecy Domai Moellig 3 5 Mar Trasiet Performace a the s-plae 4 Mar Block Diagrams Assig

More information

Stochastic Simulation

Stochastic Simulation Stochastic Simulatio 1 Itroductio Readig Assigmet: Read Chapter 1 of text. We shall itroduce may of the key issues to be discussed i this course via a couple of model problems. Model Problem 1 (Jackso

More information

Sparsification using Regular and Weighted. Graphs

Sparsification using Regular and Weighted. Graphs Sparsificatio usig Regular a Weighte 1 Graphs Aly El Gamal ECE Departmet a Cooriate Sciece Laboratory Uiversity of Illiois at Urbaa-Champaig Abstract We review the state of the art results o spectral approximatio

More information

Probability in Medical Imaging

Probability in Medical Imaging Chapter P Probability i Meical Imagig Cotets Itrouctio P1 Probability a isotropic emissios P2 Raioactive ecay statistics P4 Biomial coutig process P4 Half-life P5 Poisso process P6 Determiig activity of

More information

The structure of Fourier series

The structure of Fourier series The structure of Fourier series Valery P Dmitriyev Lomoosov Uiversity, Russia Date: February 3, 2011) Fourier series is costructe basig o the iea to moel the elemetary oscillatio 1, +1) by the expoetial

More information

Lecture #3. Math tools covered today

Lecture #3. Math tools covered today Toay s Program:. Review of previous lecture. QM free particle a particle i a bo. 3. Priciple of spectral ecompositio. 4. Fourth Postulate Math tools covere toay Lecture #3. Lear how to solve separable

More information

Clustering. CM226: Machine Learning for Bioinformatics. Fall Sriram Sankararaman Acknowledgments: Fei Sha, Ameet Talwalkar.

Clustering. CM226: Machine Learning for Bioinformatics. Fall Sriram Sankararaman Acknowledgments: Fei Sha, Ameet Talwalkar. Clusterig CM226: Machie Learig for Bioiformatics. Fall 216 Sriram Sakararama Ackowledgmets: Fei Sha, Ameet Talwalkar Clusterig 1 / 42 Admiistratio HW 1 due o Moday. Email/post o CCLE if you have questios.

More information

1 Review of Probability & Statistics

1 Review of Probability & Statistics 1 Review of Probability & Statistics a. I a group of 000 people, it has bee reported that there are: 61 smokers 670 over 5 960 people who imbibe (drik alcohol) 86 smokers who imbibe 90 imbibers over 5

More information

Exponential Families and Bayesian Inference

Exponential Families and Bayesian Inference Computer Visio Expoetial Families ad Bayesia Iferece Lecture Expoetial Families A expoetial family of distributios is a d-parameter family f(x; havig the followig form: f(x; = h(xe g(t T (x B(, (. where

More information

Definition 2 (Eigenvalue Expansion). We say a d-regular graph is a λ eigenvalue expander if

Definition 2 (Eigenvalue Expansion). We say a d-regular graph is a λ eigenvalue expander if Expaer Graphs Graph Theory (Fall 011) Rutgers Uiversity Swastik Kopparty Throughout these otes G is a -regular graph 1 The Spectrum Let A G be the ajacecy matrix of G Let λ 1 λ λ be the eigevalues of A

More information

Statistical Pattern Recognition

Statistical Pattern Recognition Statistical Patter Recogitio Classificatio: No-Parametric Modelig Hamid R. Rabiee Jafar Muhammadi Sprig 2014 http://ce.sharif.edu/courses/92-93/2/ce725-2/ Ageda Parametric Modelig No-Parametric Modelig

More information

Chapter 2 The Monte Carlo Method

Chapter 2 The Monte Carlo Method Chapter 2 The Mote Carlo Method The Mote Carlo Method stads for a broad class of computatioal algorithms that rely o radom sampligs. It is ofte used i physical ad mathematical problems ad is most useful

More information

Limit Theorems. Convergence in Probability. Let X be the number of heads observed in n tosses. Then, E[X] = np and Var[X] = np(1-p).

Limit Theorems. Convergence in Probability. Let X be the number of heads observed in n tosses. Then, E[X] = np and Var[X] = np(1-p). Limit Theorems Covergece i Probability Let X be the umber of heads observed i tosses. The, E[X] = p ad Var[X] = p(-p). L O This P x p NM QP P x p should be close to uity for large if our ituitio is correct.

More information

3. Calculus with distributions

3. Calculus with distributions 6 RODICA D. COSTIN 3.1. Limits of istributios. 3. Calculus with istributios Defiitio 4. A sequece of istributios {u } coverges to the istributio u (all efie o the same space of test fuctios) if (φ, u )

More information

Composite Hermite and Anti-Hermite Polynomials

Composite Hermite and Anti-Hermite Polynomials Avaces i Pure Mathematics 5 5 87-87 Publishe Olie December 5 i SciRes. http://www.scirp.org/joural/apm http://.oi.org/.436/apm.5.5476 Composite Hermite a Ati-Hermite Polyomials Joseph Akeyo Omolo Departmet

More information

Double Stage Shrinkage Estimator of Two Parameters. Generalized Exponential Distribution

Double Stage Shrinkage Estimator of Two Parameters. Generalized Exponential Distribution Iteratioal Mathematical Forum, Vol., 3, o. 3, 3-53 HIKARI Ltd, www.m-hikari.com http://dx.doi.org/.9/imf.3.335 Double Stage Shrikage Estimator of Two Parameters Geeralized Expoetial Distributio Alaa M.

More information

1 of 7 7/16/2009 6:06 AM Virtual Laboratories > 6. Radom Samples > 1 2 3 4 5 6 7 6. Order Statistics Defiitios Suppose agai that we have a basic radom experimet, ad that X is a real-valued radom variable

More information

1 Introduction to reducing variance in Monte Carlo simulations

1 Introduction to reducing variance in Monte Carlo simulations Copyright c 010 by Karl Sigma 1 Itroductio to reducig variace i Mote Carlo simulatios 11 Review of cofidece itervals for estimatig a mea I statistics, we estimate a ukow mea µ = E(X) of a distributio by

More information

Intelligent Systems I 08 SVM

Intelligent Systems I 08 SVM Itelliget Systems I 08 SVM Stefa Harmelig & Philipp Heig 12. December 2013 Max Plack Istitute for Itelliget Systems Dptmt. of Empirical Iferece 1 / 30 Your feeback Ejoye most Laplace approximatio gettig

More information

k=1 s k (x) (3) and that the corresponding infinite series may also converge; moreover, if it converges, then it defines a function S through its sum

k=1 s k (x) (3) and that the corresponding infinite series may also converge; moreover, if it converges, then it defines a function S through its sum 0. L Hôpital s rule You alreay kow from Lecture 0 that ay sequece {s k } iuces a sequece of fiite sums {S } through S = s k, a that if s k 0 as k the {S } may coverge to the it k= S = s s s 3 s 4 = s k.

More information

THE LEGENDRE POLYNOMIALS AND THEIR PROPERTIES. r If one now thinks of obtaining the potential of a distributed mass, the solution becomes-

THE LEGENDRE POLYNOMIALS AND THEIR PROPERTIES. r If one now thinks of obtaining the potential of a distributed mass, the solution becomes- THE LEGENDRE OLYNOMIALS AND THEIR ROERTIES The gravitatioal potetial ψ at a poit A at istace r from a poit mass locate at B ca be represete by the solutio of the Laplace equatio i spherical cooriates.

More information

EECS564 Estimation, Filtering, and Detection Hwk 2 Solns. Winter p θ (z) = (2θz + 1 θ), 0 z 1

EECS564 Estimation, Filtering, and Detection Hwk 2 Solns. Winter p θ (z) = (2θz + 1 θ), 0 z 1 EECS564 Estimatio, Filterig, ad Detectio Hwk 2 Sols. Witer 25 4. Let Z be a sigle observatio havig desity fuctio where. p (z) = (2z + ), z (a) Assumig that is a oradom parameter, fid ad plot the maximum

More information

Mathematics 1 Outcome 1a. Pascall s Triangle and the Binomial Theorem (8 pers) Cumulative total = 8 periods. Lesson, Outline, Approach etc.

Mathematics 1 Outcome 1a. Pascall s Triangle and the Binomial Theorem (8 pers) Cumulative total = 8 periods. Lesson, Outline, Approach etc. prouce for by Tom Strag Pascall s Triagle a the Biomial Theorem (8 pers) Mathematics 1 Outcome 1a Lesso, Outlie, Approach etc. Nelso MIA - AH M1 1 Itrouctio to Pascal s Triagle via routes alog a set of

More information

ECE 8527: Introduction to Machine Learning and Pattern Recognition Midterm # 1. Vaishali Amin Fall, 2015

ECE 8527: Introduction to Machine Learning and Pattern Recognition Midterm # 1. Vaishali Amin Fall, 2015 ECE 8527: Itroductio to Machie Learig ad Patter Recogitio Midterm # 1 Vaishali Ami Fall, 2015 tue39624@temple.edu Problem No. 1: Cosider a two-class discrete distributio problem: ω 1 :{[0,0], [2,0], [2,2],

More information

Variance function estimation in multivariate nonparametric regression with fixed design

Variance function estimation in multivariate nonparametric regression with fixed design Joural of Multivariate Aalysis 00 009 6 36 Cotets lists available at ScieceDirect Joural of Multivariate Aalysis joural homepage: www.elsevier.com/locate/jmva Variace fuctio estimatio i multivariate oparametric

More information

Vector Quantization: a Limiting Case of EM

Vector Quantization: a Limiting Case of EM . Itroductio & defiitios Assume that you are give a data set X = { x j }, j { 2,,, }, of d -dimesioal vectors. The vector quatizatio (VQ) problem requires that we fid a set of prototype vectors Z = { z

More information

DS 100: Principles and Techniques of Data Science Date: April 13, Discussion #10

DS 100: Principles and Techniques of Data Science Date: April 13, Discussion #10 DS 00: Priciples ad Techiques of Data Sciece Date: April 3, 208 Name: Hypothesis Testig Discussio #0. Defie these terms below as they relate to hypothesis testig. a) Data Geeratio Model: Solutio: A set

More information

MATH 320: Probability and Statistics 9. Estimation and Testing of Parameters. Readings: Pruim, Chapter 4

MATH 320: Probability and Statistics 9. Estimation and Testing of Parameters. Readings: Pruim, Chapter 4 MATH 30: Probability ad Statistics 9. Estimatio ad Testig of Parameters Estimatio ad Testig of Parameters We have bee dealig situatios i which we have full kowledge of the distributio of a radom variable.

More information

9.3 constructive interference occurs when waves build each other up, producing a resultant wave of greater amplitude than the given waves

9.3 constructive interference occurs when waves build each other up, producing a resultant wave of greater amplitude than the given waves Iterferece of Waves i Two Dimesios Costructive a estructive iterferece may occur i two imesios, sometimes proucig fixe patters of iterferece. To prouce a fixe patter, the iterferig waves must have the

More information

Analytic Number Theory Solutions

Analytic Number Theory Solutions Aalytic Number Theory Solutios Sea Li Corell Uiversity sl6@corell.eu Ja. 03 Itrouctio This ocumet is a work-i-progress solutio maual for Tom Apostol s Itrouctio to Aalytic Number Theory. The solutios were

More information

Goodness-Of-Fit For The Generalized Exponential Distribution. Abstract

Goodness-Of-Fit For The Generalized Exponential Distribution. Abstract Goodess-Of-Fit For The Geeralized Expoetial Distributio By Amal S. Hassa stitute of Statistical Studies & Research Cairo Uiversity Abstract Recetly a ew distributio called geeralized expoetial or expoetiated

More information

A Central Limit Theorem for Local Polynomial Backfitting Estimators

A Central Limit Theorem for Local Polynomial Backfitting Estimators Joural of Multivariate Aalysis 70, 5765 (1999) Article ID mva.1999.1812, available olie at httpwww.iealibrary.com o A Cetral Limit Theorem for Local Polyomial Backfittig Estimators M. P. Wa Harvar School

More information

Machine Learning Brett Bernstein

Machine Learning Brett Bernstein Machie Learig Brett Berstei Week Lecture: Cocept Check Exercises Starred problems are optioal. Statistical Learig Theory. Suppose A = Y = R ad X is some other set. Furthermore, assume P X Y is a discrete

More information

EXAMINATIONS OF THE ROYAL STATISTICAL SOCIETY

EXAMINATIONS OF THE ROYAL STATISTICAL SOCIETY EXAMINATIONS OF THE ROYAL STATISTICAL SOCIETY GRADUATE DIPLOMA, 016 MODULE : Statistical Iferece Time allowed: Three hours Cadidates should aswer FIVE questios. All questios carry equal marks. The umber

More information

Algorithms in The Real World Fall 2002 Homework Assignment 2 Solutions

Algorithms in The Real World Fall 2002 Homework Assignment 2 Solutions Algorithms i The Real Worl Fall 00 Homewor Assigmet Solutios Problem. Suppose that a bipartite graph with oes o the left a oes o the right is costructe by coectig each oe o the left to raomly-selecte oes

More information

11/19/ Chapter 10 Overview. Chapter 10: Two-Sample Inference. + The Big Picture : Inference for Mean Difference Dependent Samples

11/19/ Chapter 10 Overview. Chapter 10: Two-Sample Inference. + The Big Picture : Inference for Mean Difference Dependent Samples /9/0 + + Chapter 0 Overview Dicoverig Statitic Eitio Daiel T. Laroe Chapter 0: Two-Sample Iferece 0. Iferece for Mea Differece Depeet Sample 0. Iferece for Two Iepeet Mea 0.3 Iferece for Two Iepeet Proportio

More information

Infinite Sequences and Series

Infinite Sequences and Series Chapter 6 Ifiite Sequeces ad Series 6.1 Ifiite Sequeces 6.1.1 Elemetary Cocepts Simply speakig, a sequece is a ordered list of umbers writte: {a 1, a 2, a 3,...a, a +1,...} where the elemets a i represet

More information

Large Sample Asymptotic for Nonparametric Mixture Model with Count Data

Large Sample Asymptotic for Nonparametric Mixture Model with Count Data Large Sample Asymptotic for Noparametric Mixture Moel with Cout ata Vu Nguye, ih Phug, Trug Le, Svetha Vekatesh eaki Uiversity, Australia vguye,ihphug,svethavekatesh}@eakieuau HCMc Uiversity of Peagogy,

More information

CHAPTER 1 BASIC CONCEPTS OF INSTRUMENTATION AND MEASUREMENT

CHAPTER 1 BASIC CONCEPTS OF INSTRUMENTATION AND MEASUREMENT CHAPTER 1 BASIC CONCEPTS OF INSTRUMENTATION AND MEASUREMENT 1.1 Classificatio of istrumets Aalog istrumet The measure parameter value is isplay by the moveable poiter. The poiter will move cotiuously with

More information

Machine Learning for Data Science (CS 4786)

Machine Learning for Data Science (CS 4786) Machie Learig for Data Sciece CS 4786) Lecture & 3: Pricipal Compoet Aalysis The text i black outlies high level ideas. The text i blue provides simple mathematical details to derive or get to the algorithm

More information

Machine Learning Brett Bernstein

Machine Learning Brett Bernstein Machie Learig Brett Berstei Week 2 Lecture: Cocept Check Exercises Starred problems are optioal. Excess Risk Decompositio 1. Let X = Y = {1, 2,..., 10}, A = {1,..., 10, 11} ad suppose the data distributio

More information

Stat 421-SP2012 Interval Estimation Section

Stat 421-SP2012 Interval Estimation Section Stat 41-SP01 Iterval Estimatio Sectio 11.1-11. We ow uderstad (Chapter 10) how to fid poit estimators of a ukow parameter. o However, a poit estimate does ot provide ay iformatio about the ucertaity (possible

More information

Fastest mixing Markov chain on a path

Fastest mixing Markov chain on a path Fastest mixig Markov chai o a path Stephe Boyd Persi Diacois Ju Su Li Xiao Revised July 2004 Abstract We ider the problem of assigig trasitio probabilities to the edges of a path, so the resultig Markov

More information

1 Review and Overview

1 Review and Overview CS229T/STATS231: Statistical Learig Theory Lecturer: Tegyu Ma Lecture #12 Scribe: Garrett Thomas, Pega Liu October 31, 2018 1 Review a Overview Recall the GAN setup: we have iepeet samples x 1,..., x raw

More information

Orthogonal Gaussian Filters for Signal Processing

Orthogonal Gaussian Filters for Signal Processing Orthogoal Gaussia Filters for Sigal Processig Mark Mackezie ad Kiet Tieu Mechaical Egieerig Uiversity of Wollogog.S.W. Australia Abstract A Gaussia filter usig the Hermite orthoormal series of fuctios

More information

CS284A: Representations and Algorithms in Molecular Biology

CS284A: Representations and Algorithms in Molecular Biology CS284A: Represetatios ad Algorithms i Molecular Biology Scribe Notes o Lectures 3 & 4: Motif Discovery via Eumeratio & Motif Represetatio Usig Positio Weight Matrix Joshua Gervi Based o presetatios by

More information

Resampling Methods. X (1/2), i.e., Pr (X i m) = 1/2. We order the data: X (1) X (2) X (n). Define the sample median: ( n.

Resampling Methods. X (1/2), i.e., Pr (X i m) = 1/2. We order the data: X (1) X (2) X (n). Define the sample median: ( n. Jauary 1, 2019 Resamplig Methods Motivatio We have so may estimators with the property θ θ d N 0, σ 2 We ca also write θ a N θ, σ 2 /, where a meas approximately distributed as Oce we have a cosistet estimator

More information

Properties and Hypothesis Testing

Properties and Hypothesis Testing Chapter 3 Properties ad Hypothesis Testig 3.1 Types of data The regressio techiques developed i previous chapters ca be applied to three differet kids of data. 1. Cross-sectioal data. 2. Time series data.

More information

Clases 7-8: Métodos de reducción de varianza en Monte Carlo *

Clases 7-8: Métodos de reducción de varianza en Monte Carlo * Clases 7-8: Métodos de reducció de variaza e Mote Carlo * 9 de septiembre de 27 Ídice. Variace reductio 2. Atithetic variates 2 2.. Example: Uiform radom variables................ 3 2.2. Example: Tail

More information

Sequences A sequence of numbers is a function whose domain is the positive integers. We can see that the sequence

Sequences A sequence of numbers is a function whose domain is the positive integers. We can see that the sequence Sequeces A sequece of umbers is a fuctio whose domai is the positive itegers. We ca see that the sequece 1, 1, 2, 2, 3, 3,... is a fuctio from the positive itegers whe we write the first sequece elemet

More information

10-701/ Machine Learning Mid-term Exam Solution

10-701/ Machine Learning Mid-term Exam Solution 0-70/5-78 Machie Learig Mid-term Exam Solutio Your Name: Your Adrew ID: True or False (Give oe setece explaatio) (20%). (F) For a cotiuous radom variable x ad its probability distributio fuctio p(x), it

More information

Monte Carlo method and application to random processes

Monte Carlo method and application to random processes Mote Carlo method ad applicatio to radom processes Lecture 3: Variace reductio techiques (8/3/2017) 1 Lecturer: Eresto Mordecki, Facultad de Ciecias, Uiversidad de la República, Motevideo, Uruguay Graduate

More information

PAijpam.eu ON TENSOR PRODUCT DECOMPOSITION

PAijpam.eu ON TENSOR PRODUCT DECOMPOSITION Iteratioal Joural of Pure ad Applied Mathematics Volume 103 No 3 2015, 537-545 ISSN: 1311-8080 (prited versio); ISSN: 1314-3395 (o-lie versio) url: http://wwwijpameu doi: http://dxdoiorg/1012732/ijpamv103i314

More information

Probabilistic model PROMO for evaluation of air change rate distribution

Probabilistic model PROMO for evaluation of air change rate distribution Probabilistic moel PROMO for evaluatio of air chage rate istributio Krystya Pietryk a Carl-Eric Hagetoft Sweish Testig a Research Istitute, Departmet of Eergy Techology, Builig Physics, Borås, SE-505,

More information

Exponential function and its derivative revisited

Exponential function and its derivative revisited Expoetial fuctio a its erivative revisite Weg Ki Ho, Foo Him Ho Natioal Istitute of Eucatio, Sigapore {wegki,foohim}.ho@ie.eu.sg Tuo Yeog Lee NUS High School of Math & Sciece hsleety@us.eu.sg February

More information

c. Explain the basic Newsvendor model. Why is it useful for SC models? e. What additional research do you believe will be helpful in this area?

c. Explain the basic Newsvendor model. Why is it useful for SC models? e. What additional research do you believe will be helpful in this area? 1. Research Methodology a. What is meat by the supply chai (SC) coordiatio problem ad does it apply to all types of SC s? Does the Bullwhip effect relate to all types of SC s? Also does it relate to SC

More information

Bayesian Control Charts for the Two-parameter Exponential Distribution

Bayesian Control Charts for the Two-parameter Exponential Distribution Bayesia Cotrol Charts for the Two-parameter Expoetial Distributio R. va Zyl, A.J. va der Merwe 2 Quitiles Iteratioal, ruaavz@gmail.com 2 Uiversity of the Free State Abstract By usig data that are the mileages

More information

Output Analysis and Run-Length Control

Output Analysis and Run-Length Control IEOR E4703: Mote Carlo Simulatio Columbia Uiversity c 2017 by Marti Haugh Output Aalysis ad Ru-Legth Cotrol I these otes we describe how the Cetral Limit Theorem ca be used to costruct approximate (1 α%

More information

Random Variables, Sampling and Estimation

Random Variables, Sampling and Estimation Chapter 1 Radom Variables, Samplig ad Estimatio 1.1 Itroductio This chapter will cover the most importat basic statistical theory you eed i order to uderstad the ecoometric material that will be comig

More information

http://www.xelca.l/articles/ufo_ladigsbaa_houte.aspx imulatio Output aalysis 3/4/06 This lecture Output: A simulatio determies the value of some performace measures, e.g. productio per hour, average queue

More information

A sequence of numbers is a function whose domain is the positive integers. We can see that the sequence

A sequence of numbers is a function whose domain is the positive integers. We can see that the sequence Sequeces A sequece of umbers is a fuctio whose domai is the positive itegers. We ca see that the sequece,, 2, 2, 3, 3,... is a fuctio from the positive itegers whe we write the first sequece elemet as

More information

THE SYSTEMATIC AND THE RANDOM. ERRORS - DUE TO ELEMENT TOLERANCES OF ELECTRICAL NETWORKS

THE SYSTEMATIC AND THE RANDOM. ERRORS - DUE TO ELEMENT TOLERANCES OF ELECTRICAL NETWORKS R775 Philips Res. Repts 26,414-423, 1971' THE SYSTEMATIC AND THE RANDOM. ERRORS - DUE TO ELEMENT TOLERANCES OF ELECTRICAL NETWORKS by H. W. HANNEMAN Abstract Usig the law of propagatio of errors, approximated

More information

Chapter 9: Numerical Differentiation

Chapter 9: Numerical Differentiation 178 Chapter 9: Numerical Differetiatio Numerical Differetiatio Formulatio of equatios for physical problems ofte ivolve derivatives (rate-of-chage quatities, such as velocity ad acceleratio). Numerical

More information

FROM SPECIFICATION TO MEASUREMENT: THE BOTTLENECK IN ANALOG INDUSTRIAL TESTING

FROM SPECIFICATION TO MEASUREMENT: THE BOTTLENECK IN ANALOG INDUSTRIAL TESTING FROM SPECIFICATION TO MEASUREMENT: THE BOTTLENECK IN ANALOG INDUSTRIAL TESTING R.J. va Rijsige, A.A.R.M. Haggeburg, C. e Vries Philips Compoets Busiess Uit Cosumer IC Gerstweg 2, 6534 AE Nijmege The Netherlas

More information

Moment closure for biochemical networks

Moment closure for biochemical networks Momet closure for biochemical etworks João Hespaha Departmet of Electrical a Computer Egieerig Uiversity of Califoria, Sata Barbara 9-9 email: hespaha@ece.ucsb.eu Abstract Momet closure is a techique use

More information

Intermittent demand forecasting by using Neural Network with simulated data

Intermittent demand forecasting by using Neural Network with simulated data Proceedigs of the 011 Iteratioal Coferece o Idustrial Egieerig ad Operatios Maagemet Kuala Lumpur, Malaysia, Jauary 4, 011 Itermittet demad forecastig by usig Neural Network with simulated data Nguye Khoa

More information

Testing Statistical Hypotheses for Compare. Means with Vague Data

Testing Statistical Hypotheses for Compare. Means with Vague Data Iteratioal Mathematical Forum 5 o. 3 65-6 Testig Statistical Hypotheses for Compare Meas with Vague Data E. Baloui Jamkhaeh ad A. adi Ghara Departmet of Statistics Islamic Azad iversity Ghaemshahr Brach

More information

Discrete Mathematics for CS Spring 2008 David Wagner Note 22

Discrete Mathematics for CS Spring 2008 David Wagner Note 22 CS 70 Discrete Mathematics for CS Sprig 2008 David Wager Note 22 I.I.D. Radom Variables Estimatig the bias of a coi Questio: We wat to estimate the proportio p of Democrats i the US populatio, by takig

More information

Element sampling: Part 2

Element sampling: Part 2 Chapter 4 Elemet samplig: Part 2 4.1 Itroductio We ow cosider uequal probability samplig desigs which is very popular i practice. I the uequal probability samplig, we ca improve the efficiecy of the resultig

More information

7-1. Chapter 4. Part I. Sampling Distributions and Confidence Intervals

7-1. Chapter 4. Part I. Sampling Distributions and Confidence Intervals 7-1 Chapter 4 Part I. Samplig Distributios ad Cofidece Itervals 1 7- Sectio 1. Samplig Distributio 7-3 Usig Statistics Statistical Iferece: Predict ad forecast values of populatio parameters... Test hypotheses

More information

Probabilistic Unsupervised Learning

Probabilistic Unsupervised Learning HT2015: SC4 Statistical Data Miig ad Machie Learig Dio Sejdiovic Departmet of Statistics Oxford http://www.stats.ox.ac.u/~sejdiov/sdmml.html Probabilistic Methods Algorithmic approach: Data Probabilistic

More information

Statistical Inference (Chapter 10) Statistical inference = learn about a population based on the information provided by a sample.

Statistical Inference (Chapter 10) Statistical inference = learn about a population based on the information provided by a sample. Statistical Iferece (Chapter 10) Statistical iferece = lear about a populatio based o the iformatio provided by a sample. Populatio: The set of all values of a radom variable X of iterest. Characterized

More information

Algorithms for Clustering

Algorithms for Clustering CR2: Statistical Learig & Applicatios Algorithms for Clusterig Lecturer: J. Salmo Scribe: A. Alcolei Settig: give a data set X R p where is the umber of observatio ad p is the umber of features, we wat

More information

Sequential Monte Carlo Methods - A Review. Arnaud Doucet. Engineering Department, Cambridge University, UK

Sequential Monte Carlo Methods - A Review. Arnaud Doucet. Engineering Department, Cambridge University, UK Sequetial Mote Carlo Methods - A Review Araud Doucet Egieerig Departmet, Cambridge Uiversity, UK http://www-sigproc.eg.cam.ac.uk/ ad2/araud doucet.html ad2@eg.cam.ac.uk Istitut Heri Poicaré - Paris - 2

More information

Outline. CSCI-567: Machine Learning (Spring 2019) Outline. Prof. Victor Adamchik. Mar. 26, 2019

Outline. CSCI-567: Machine Learning (Spring 2019) Outline. Prof. Victor Adamchik. Mar. 26, 2019 Outlie CSCI-567: Machie Learig Sprig 209 Gaussia mixture models Prof. Victor Adamchik 2 Desity estimatio U of Souther Califoria Mar. 26, 209 3 Naive Bayes Revisited March 26, 209 / 57 March 26, 209 2 /

More information

CSE 527, Additional notes on MLE & EM

CSE 527, Additional notes on MLE & EM CSE 57 Lecture Notes: MLE & EM CSE 57, Additioal otes o MLE & EM Based o earlier otes by C. Grat & M. Narasimha Itroductio Last lecture we bega a examiatio of model based clusterig. This lecture will be

More information