A Simple Probabilistic Explanation of Term Frequency-Inverse Document Frequency (tf-idf) Heuristic (and Variations Motivated by This Explanation)
|
|
- Elvin Ryan
- 6 years ago
- Views:
Transcription
1 Uiversity of Texas at El Paso Departmetal Techical Reports (CS) Departmet of Computer Sciece A Simple Probabilistic Explaatio of Term Frequecy-Iverse Documet Frequecy (tf-idf) Heuristic (ad Variatios Motivated by This Explaatio) Lukas Havrlat Palacky Uiversity Olomouc, lukashavrlat@upolcz Vladik Kreiovich Uiversity of Texas at El Paso, vladik@utepedu Follow this ad additioal works at: Part of the Computer Scieces Commos Commets: Techical Report: UTEP-CS Recommeded Citatio Havrlat, Lukas ad Kreiovich, Vladik, "A Simple Probabilistic Explaatio of Term Frequecy-Iverse Documet Frequecy (tfidf) Heuristic (ad Variatios Motivated by This Explaatio)" (2014) Departmetal Techical Reports (CS) Paper This Article is brought to you for free ad ope access by the Departmet of Computer Sciece at DigitalCommos@UTEP It has bee accepted for iclusio i Departmetal Techical Reports (CS) by a authorized admiistrator of DigitalCommos@UTEP For more iformatio, please cotact lweber@utepedu
2 To appear i the Iteratioal Joural of Geeral Systems Vol 00, o 00, Moth 20XX, 1 9 A Simple Probabilistic Explaatio of Term Frequecy-Iverse Documet Frequecy (tf-idf) Heuristic (ad Variatios Motivated by This Explaatio) Lukáš Havrlat a ad Vladik Kreiovich b a Departmet of Computer Sciece, Palacky Uiversity Olomouc, 17 listopadu 12, CZ Olomouc, Czech Republic, lukashavrlat@upolcz; b Departmet of Computer Sciece, Uiversity of Texas at El Paso, 500 W Uiversity, El Paso, TX 79968, USA (Received 23 May 2014; accepted XX moth 20XX) I documet aalysis, a importat task is to automatically fid keywords which best describe the subject of the documet Oe of the most widely used techiques for keyword detectio is a techique based o the term frequecy-iverse documet frequecy (tf-idf) heuristic This techiques has some explaatios, but these explaatios are somewhat too complex to be fully covicig I this paper, we provide a simple probabilistic explaatio for the tf-idf heuristic We also show that the ideas behid explaatio ca help us come up with more complex formulas which will hopefully lead to a more adequate detectio of keywords Keywords: keywords; term frequecy-iverse documet frequecy (tf-idf); probabilistic explaatio 1 tf-idf: A Brief Remider ad Formulatio of the Problem How to fid keywords: qualitative idea Give a documet, how ca we idetify its keywords? I may cases, this is easy: eg, if we have a text which metios Turig machies may times, the clearly the term Turig machie should be selected as a keyword This does ot mea, of course, that every word which occurs may times i a documet is a meaigful keyword eg, words like a or the occurs may times i a documet, but we should select them as keywords characterizig a give documet sice they occur may times i every documet So, to idetify keywords, it is ecessary to take ito accout ot oly how may times a give word t occurs i a give documet d, but also how frequetly the word t occurs i other documets How to fid keywords: a (semi-empirical) algorithm Iformatio retrieval ad text miig use a special term frequecy-iverse documet frequecy (tf-idf) algorithm for automatic detectio of keywords; see, eg, Maig, Raghava, ad Schütze (2008); Rajarama ad Ullma (2011) The ideas behid this algorithm were first proposed i Joes (1972) This algorithm uses the followig three umerical characteristics: Correspodig author latexhelpdesk@tadfcouk 1
3 the umber of time tf(t, d) that the word t occurs i a documet d; this umber is kow as term frequecy; ad the umber total umber of documets i a give corpus; ad the umber of documets df(t) which cotai the give term t; this umber is kow as the documet frequecy Based o the last two characteristics, the algorithm computes the quatity idf(t) def = l kow as the iverse documet frequecy df(t) As keywords characterizig a give documet d, we the select words t with the largest value of the product tf-idf(t, d, D) def = tf(t, d) idf(t, D) Remaiig challege The tf-idf algorithm works reasoably well: i may cases, it leads to a adequate selectio of keywords What is ot clear is why the above formula is so successful while other similar formulas (see, eg, Maig, Raghava, ad Schütze (2008)) are ot so successful There have bee several attempts to provide a theoretical explaatio for the success of tf-idf Heimstra (2000); Joes (1972); Maig, Raghava, ad Schütze (2008); Robertso (2004), but the resultig explaatios are somewhat overcomplicated ad ot very covicig What we do i this paper I this paper, we provide a simple probabilistic explaatio for the tf-idf heuristic This explaatio motives some modificatios of the origial tf-idf formulas; we hope that these modificatios will be useful too 2 A Simple Probabilistic Explaatio of tf-idf Simplified model: mai idea Let us deote the total umber of occurreces of the word t i the whole corpus D by tf(t, D) Let us cosider a simplified model i which each of these occurreces is radomly assiged (with equal probability) to oe of the documets from the corpus D (ad differet occurreces are assiged idepedetly from oe aother) I this model, the probability that each occurrece of the word t is assiged to a give documet is equal to 1 Let us estimate the probability that this simplified model leads to the give umber of occurreces Let us estimate the probability p that after radomly (ad idepedetly) assigig all def = tf(t, D) occurreces, the documet d will cotai k def = tf(t, d) occurreces The smaller this probability, the less probable it is that the text got k occurreces radomly, ad thus, the more cofidet we are that the word t is importat for the give documet ie, that t is oe of d s keywords Aalysis of the simplified model ad the resultig formula for the desired probability Let us start with the case k = 1, whe the documet cotais exactly oe occurrece of the word t To compute this probability, let us first estimate the probability that the assigmet of the first of words t placed this word ito the give documet d, ad all the other 1 assigmets placed the correspodig word i other documets The probability that, out of documets, the first assigmet is placed ito the documet d, is equal to 1 ; the probability that each of the ext 1 assigmets is placed i oe of other 1 documets is equal to 2
4 1 = 1 1 Sice the assigmets of differet occurreces are idepedet, the resultig probability of this situatio is ( ) 1 The overall probability that k = 1 comes from such icompatible evets: the evet that the first occurrece laded up i d, the evet that the secod occurrece laded up i d, etc Thus, the overall probability p that k = 1 is equal to the sum of such terms, ie, to p = 1 ( 1 1 ) 1 For k = 2, we ca similarly compute the correspodig probability p: for each pair of occurreces, the probability that these two occurreces were placed i d ad all 2 others were placed i other 1 documets is equal to ( ) 1 2 ( 1 1 ) 2 Thus, the ( probability ) p ca be obtaied by multiplyig this probability by the total umber of such pairs: 2 p = ( ) ( ) 1 2 ( 1 1 ) 2 2 Similarly, for a geeral k, for each k-tuple of occurreces, the probability that these k occurreces were placed i d ad all k others were placed i other 1 documets is equal to ( ) 1 k ( 1 1 ) k Thus, the ( probability ) p ca be obtaied by multiplyig this probability by the total umber of such tuples: k p = ( ) ( ) 1 k ( 1 1 ) k k Aalysis of the problem ad the resultig iequalities betwee k,, ad We are iterested i the cases whe the total umber of occurreces of the word t is much smaller tha the total umber of documets i the corpus: Ideed, if is comparable with as is the case of such words as a, the, etc this meas that the word t occurs i a large portio of documets ad is, therefore, ot typical for the give documet d so it caot serve as oe of its keywords We are also iterested i the cases whe the total umber k of the occurreces of the term t i the give documet is much smaller tha its total umber of 3
5 occurreces i the whole corpus of documets Let us explai this requiremet Of course, by defiitio, k is always smaller tha or equal to If k is of the same order as, this meas that o average, there are very few documets that cotai this term This ca happe, for example, if a author itroduced a ew techical term i oe paper ad uses this term i aother paper However, i this case, it is ot a good idea to use this ew term as a keyword: oe of the mai purposes of the keyword is to make it to clear to the reader what this paper is about From this viewpoit, usig, as a keyword, a term which o oe uses ad thus, most probably, o oe uderstads makes o sese Thus, keywords are meaigful oly if k Fially, for a word t to be a reasoable keyword for a documet d, it should appear several times i the documet: 1 k Summarizig, whe we look for meaigful keywords, we should limit ourselves to cases whe 1 k The above iequalities help simplify the expressio for the probability Let us show how the above iequalities allow us to simplify the above expressio for the probability p Specifically, the above expressio represets the probability p ( ) ( ) 1 k ( as the product of three factors:,, ad 1 1 ) k ; we will show that k the first ad the third factors ca simplified First, by usig the expasio of the fuctio (1 x) k i Taylor series, we get ( 1 1 ) k = 1 ( k) 1 + Sice, we have k, hece ( k) approximatio, ( 1 1 ) k ad so, i the first To simplify a expressio for of combiatios: ( ) = k This ca be equivaletly described as ( ), let us use a explicit expressio for the umber k ( 1) ( 2) ( k) 1 2 k ( ) ( = k k k! 1 1 ) ( 1 2 ) ( 1 k ) Sice k, we similarly get 1 1 1,, 1 k 1 ad therefore, ( ) k k k! 4
6 Sice k 1, we ca use Stirlig formula (see, eg, Abramowitz ad Stegu (2002)) for the factorial k! ( k e ) k, so ( ) k e k k k k Substitutig these two approximate expressios for the factors ito the formula for the probability p, we get a approximate formula p k e k ( ) 1 k k k From probabilities to their logarithms The correspodig computatios ca be further simplified if we use logarithms of the probabilities istead of the probabilities themselves Sice logarithm is mootoic, the use of probabilities does ot chage which term is more probable ad which is less probable; however, sice l(a b) = l(a) + l(b) ad l(a k ) = k l(a), the use of logarithms replaces multiplicatio with a computatioally simpler additio operatio, ad raisig to the power with a computatioally simpler multiplicatio this is why logarithms were iveted i the first place Sice the probability p is smaller tha 1, its logarithm l(p) is egative; to make it more coveiet, let us cosider its opposite l(p) From the above approximate formula, we coclude that l(p) k + k l(k) k + k l() = k l ( ) + k (l(k) 1) Here, k 1, so l(k) 1 Thus, we arrive at the followig formula: Resultig formula l(p) k l ( ) + k l(k) The smaller the probability p, the larger this value ad therefore, the more probable it is that the word t is oe of the keywords describig the documet d Thus, as keywords describig a documet, we should select all the terms t for which this expressio is the largest Let us compare this formula with the tf-idf formula The tf-idf formula has the form ( ) k l, ñ where ñ is the umber of documets that cotai the term t Due to the kow Zipf s law, most documets cotai just oe occurrece of the term t; thus, the overall umber of occurreces of the term t is approximately equal to the umber ñ of the documets that cotai t: ñ Hece, l(p) k l ( ) + k l(k) ñ 5
7 Whe k, we have l(k) l ( Ñ l(p) k l ), ad therefore, ( ) ñ This is exactly the tf-idf formula that we wated to explai Thus, we ideed get a simple probabilistic justificatio of the tf-idf formula Beyod explaatio, towards a more accurate formula The above aalysis eables us ot oly to justify the existig semi-heuristic tf-idf formula, we ca also provide a ew formula which more accurately describes the probabilistic meaig ad which, we hope, will be eve more adequate i selectig keywords amely, istead of selectig keywords based o the tf-idf product expressio, we should select keywords based o the value tf(t, d) idf(t, D) + tf(t, d) l(tf(t, d)), where the ew measure of iverse documet frequecy is defied as ( ) idf(t) def = l, df(t) ad the ew documet frequecy df(t) is defied as the total umber of occurreces of the term t i the whole corpus of documets Commet We ca get a eve more accurate descriptio of the probability if we cosider a more realistic (ad, thus, more complex) probabilistic model 3 A More Realistic Probabilistic Model ad the Resultig Modificatio of tf-idf Towards a more accurate model I the above simplified model, we treated all the documets i the corpus equally I practice, some documets are loger ad some are shorter Clearly, if a documet is loger, it has a higher probability to cotai several occurreces of the give term t Let us show how we ca take this fact ito accout Resultig model We wat to take ito accout that differet documets have differet umber of words Let us deote the total umber of words i a documet d by w(d); we will call this umber the legth of the documet d Let W be the total umber of words i all the documets i the give corpus Out of these W words, we have ñ = df(t) occurreces of each word t Thus, the probability p(t) that a radomly selected word is the occurrece of the word t is equal to the ratio p(t) = df(t) The correspodig probabilistic model is W straightforward: ito each of W word locatios, we place a term t with probability p(t), ad assigmets correspodig to differet locatios are idepedet How probable it is that, as a result of this radom assigmet, i a documet d with w(d) words, we will get tf(t, d) words? The lower the probability of this result, the more probable it is that the word t is oe of the keywords of the documet d 6
8 Aalysis of the probabilistic model After the above-described radom assigmet, the resultig umber of occurreces of t ca be computed as the sum tf(t, d) = x x w(d), where: x i = 1 if the word at the i-th locatio is t ad x i = 0 if the word at the i-th locatio is differet from t Sice assigmets correspodig to differet locatios are idepedet ad idetically distributed, the value tf(t, d) is thus equal to the sum of w(d) idepedet idetically distributed radom variables A documet usually cotais a reasoably large umber of words; so, to describe the probability distributio of the value tf(t, d), we ca use the Cetral Limit Theorem, accordig to which the probability distributio of the sum of may idepedet idetically distributed radom variables is close to Gaussia (ormal); see, eg, Sheski (2011) A ormal distributio is uiquely determied by its mea µ ad its variace V = σ 2 Whe we add idepedet radom variables, their meas add ad their variaces add Thus, for the sum of w(d) idepedet idetically distributed radom variables x i, we get: µ = w(d) µ i, where µ i is the mea of each of the variables x i, ad V = w(d) V i, where V i is the mea of the variable x i Each variable x i has two possible values v j : Thus, the value v 1 = 1 with probability p 1 = p(t), ad the value v 0 = 0 with the remaiig probability p 0 = 1 p(t) µ i = j p j v j = p(t) 1 + (1 p(t)) 0 = p(t) Similarly, V i = j p j (v j µ i ) 2 = p(t) (1 p(t)) 2 +0 (0 p(t)) 2 = p(t) (1 p(t)) 2 +(1 p(t) p(t) 2 The two terms i the right-had side have a commo factor, so Thus, V i = p(t) (1 p(t)) ((1 p(t)) + p(t)) = p(t) (1 p(t)) µ = w(d) µ i = w(d) p(t); V = w(d) p(t) (1 p(t)) For ormal distributio, possible values are values withi the iterval [µ k 0 σ, µ + k 0 σ], where k 0 is usually 2, 3, or 6 The larger k 0, the less probable it is for the correspodig value to appear For a give value x, the correspodig value k 0 is determied by the equality µ ± k 0 σ = x, so k 0 = ± x µ σ, ad k 0 = x µ σ 7
9 For x = tf(t, d), the resultig ratio is equal to tf(t, d) w(d) p(t) w(d) p(t) (1 p(t)) Let us simplify this formula A keyword should occur much more frequetly i this documet tha it occurs i the corpus i geeral Thus, whe we look for tf(t, d) keywords, we are iterested oly i the words for which p(t) For such w(d) words, tf(t, d) w(d) p(t); thus, tf(t, d) w(d) p(t) tf(t, d) ad therefore, the above formula gets a simplified form tf(t, d) w(d) p(t) (1 p(t)) Also, as we have discussed earlier, as meaigful keywords, we caot take words like a or the which occur frequetly i all the documets Thus, meaigful keywords should be relatively rare: we should have p(t) 1 For such words, 1 p(t) 1, ad we get a eve simpler formula for the resultig criterio: tf(t, d) w(d) p(t) Substitutig the expressio for p(t) ito this formula, we get the followig fial expressio Resultig formula As keywords correspodig to the documet d, we should select words t for which the followig value is the largest: W tf(t, d) df(t) w(d) Let us compare the ew formula with tf-idf expressio The tf-idf formula correspods to the case whe we igore the fact that differet documets have differet legths, ie, i effect, assume that all the documets have the same legth If w(d) = cost, the the ratio W is simply the total umber of the documets, w(d) ad the above formula takes the form tf(t, d) idf(t) (t) = tf(t, d) df This formula is very similar to tf-idf (to be more precise, it is similar to the modificatio of tf-idf that we described i the previous sectio); the mai differece is that, istead of the logarithm of the iverse documet frequecy, we take the square root 8
10 Ackowledgemets This work was supported i part by the atioal Sciece Foudatio grats HRD ad HRD (Cyber-ShARE Ceter of Excellece) ad DUE This work was doe whe L Havrlat was visitig the Uiversity of Texas at El Paso, a visit supported by the Palacky Uiversity Refereces Abramowitz, M, ad I Stegu 2002 Hadbook of Mathematical Fuctios, ew York: Dover Publicatios Heimstra, D 2000 A probabilistic justificatio for usig tf idf term weightig i iformatio retrieval, Iteratioal Joural of Digital Libraries 3: Joes, K S 1972 A statistical iterpretatio of term specificity ad its applicatio i retreival, Joural of Documetatio 28 (1): 11 21; reprited i 2004, 60 (5): C D Maig, P Raghava, ad H Schütze 2008 Itroductio to Iformatio Retrieval, ew York: Cambridge Uiversity Press A Rajarama ad J D Ullma 2011 Miig of Massive Datasets, Cambridge, UK: Cambridge Uiversity Press Robersto, S 2004 Uderstadig iverse documet frequecy: o theoretical argumets for idf, Joural of Documetatio 60 (5): Sheski, D J 2011 Hadbook of Parametric ad oparametric Statistical Procedures, Boca Rato, Florida: Chapma & Hall/CRC 9
Towards Decision Making under Interval, Set-Valued, Fuzzy, and Z-Number Uncertainty: A Fair Price Approach
Towards Decisio Makig uder Iterval, Set-Valued, Fuzzy, ad Z-Number Ucertaity: A Fair Price Approach Joe Lorkowski, Rafik Aliev, ad Vladik Kreiovich Abstract I this paper, we explore oe of the possible
More informationDISTRIBUTION LAW Okunev I.V.
1 DISTRIBUTION LAW Okuev I.V. Distributio law belogs to a umber of the most complicated theoretical laws of mathematics. But it is also a very importat practical law. Nothig ca help uderstad complicated
More informationRelations between the continuous and the discrete Lotka power function
Relatios betwee the cotiuous ad the discrete Lotka power fuctio by L. Egghe Limburgs Uiversitair Cetrum (LUC), Uiversitaire Campus, B-3590 Diepebeek, Belgium ad Uiversiteit Atwerpe (UA), Campus Drie Eike,
More information1 Introduction to reducing variance in Monte Carlo simulations
Copyright c 010 by Karl Sigma 1 Itroductio to reducig variace i Mote Carlo simulatios 11 Review of cofidece itervals for estimatig a mea I statistics, we estimate a ukow mea µ = E(X) of a distributio by
More informationLesson 10: Limits and Continuity
www.scimsacademy.com Lesso 10: Limits ad Cotiuity SCIMS Academy 1 Limit of a fuctio The cocept of limit of a fuctio is cetral to all other cocepts i calculus (like cotiuity, derivative, defiite itegrals
More informationFACULTY OF MATHEMATICAL STUDIES MATHEMATICS FOR PART I ENGINEERING. Lectures
FACULTY OF MATHEMATICAL STUDIES MATHEMATICS FOR PART I ENGINEERING Lectures MODULE 5 STATISTICS II. Mea ad stadard error of sample data. Biomial distributio. Normal distributio 4. Samplig 5. Cofidece itervals
More informationTopic 9: Sampling Distributions of Estimators
Topic 9: Samplig Distributios of Estimators Course 003, 2016 Page 0 Samplig distributios of estimators Sice our estimators are statistics (particular fuctios of radom variables), their distributio ca be
More informationThe standard deviation of the mean
Physics 6C Fall 20 The stadard deviatio of the mea These otes provide some clarificatio o the distictio betwee the stadard deviatio ad the stadard deviatio of the mea.. The sample mea ad variace Cosider
More informationChapter 6 Part 5. Confidence Intervals t distribution chi square distribution. October 23, 2008
Chapter 6 Part 5 Cofidece Itervals t distributio chi square distributio October 23, 2008 The will be o help sessio o Moday, October 27. Goal: To clearly uderstad the lik betwee probability ad cofidece
More informationMa 530 Introduction to Power Series
Ma 530 Itroductio to Power Series Please ote that there is material o power series at Visual Calculus. Some of this material was used as part of the presetatio of the topics that follow. What is a Power
More informationStatistics 511 Additional Materials
Cofidece Itervals o mu Statistics 511 Additioal Materials This topic officially moves us from probability to statistics. We begi to discuss makig ifereces about the populatio. Oe way to differetiate probability
More informationThe Random Walk For Dummies
The Radom Walk For Dummies Richard A Mote Abstract We look at the priciples goverig the oe-dimesioal discrete radom walk First we review five basic cocepts of probability theory The we cosider the Beroulli
More informationInformation-based Feature Selection
Iformatio-based Feature Selectio Farza Faria, Abbas Kazeroui, Afshi Babveyh Email: {faria,abbask,afshib}@staford.edu 1 Itroductio Feature selectio is a topic of great iterest i applicatios dealig with
More information62. Power series Definition 16. (Power series) Given a sequence {c n }, the series. c n x n = c 0 + c 1 x + c 2 x 2 + c 3 x 3 +
62. Power series Defiitio 16. (Power series) Give a sequece {c }, the series c x = c 0 + c 1 x + c 2 x 2 + c 3 x 3 + is called a power series i the variable x. The umbers c are called the coefficiets of
More informationApproximations and more PMFs and PDFs
Approximatios ad more PMFs ad PDFs Saad Meimeh 1 Approximatio of biomial with Poisso Cosider the biomial distributio ( b(k,,p = p k (1 p k, k λ: k Assume that is large, ad p is small, but p λ at the limit.
More informationTopic 9: Sampling Distributions of Estimators
Topic 9: Samplig Distributios of Estimators Course 003, 2018 Page 0 Samplig distributios of estimators Sice our estimators are statistics (particular fuctios of radom variables), their distributio ca be
More information2 Geometric interpretation of complex numbers
2 Geometric iterpretatio of complex umbers 2.1 Defiitio I will start fially with a precise defiitio, assumig that such mathematical object as vector space R 2 is well familiar to the studets. Recall that
More informationAcross-the-Board Spending Cuts Are Very Inefficient: A Proof
Uiversity of Texas at El Paso DigitalCommos@UTEP Departmetal Techical Reports (CS) Departmet of Computer Sciece 6-2015 Across-the-Board Spedig Cuts Are Very Iefficiet: A Proof Vladik Kreiovich Uiversity
More informationModule 1 Fundamentals in statistics
Normal Distributio Repeated observatios that differ because of experimetal error ofte vary about some cetral value i a roughly symmetrical distributio i which small deviatios occur much more frequetly
More informationStat 421-SP2012 Interval Estimation Section
Stat 41-SP01 Iterval Estimatio Sectio 11.1-11. We ow uderstad (Chapter 10) how to fid poit estimators of a ukow parameter. o However, a poit estimate does ot provide ay iformatio about the ucertaity (possible
More informationEcon 325/327 Notes on Sample Mean, Sample Proportion, Central Limit Theorem, Chi-square Distribution, Student s t distribution 1.
Eco 325/327 Notes o Sample Mea, Sample Proportio, Cetral Limit Theorem, Chi-square Distributio, Studet s t distributio 1 Sample Mea By Hiro Kasahara We cosider a radom sample from a populatio. Defiitio
More informationTopic 9: Sampling Distributions of Estimators
Topic 9: Samplig Distributios of Estimators Course 003, 2018 Page 0 Samplig distributios of estimators Sice our estimators are statistics (particular fuctios of radom variables), their distributio ca be
More informationTHE KALMAN FILTER RAUL ROJAS
THE KALMAN FILTER RAUL ROJAS Abstract. This paper provides a getle itroductio to the Kalma filter, a umerical method that ca be used for sesor fusio or for calculatio of trajectories. First, we cosider
More informationBasics of Probability Theory (for Theory of Computation courses)
Basics of Probability Theory (for Theory of Computatio courses) Oded Goldreich Departmet of Computer Sciece Weizma Istitute of Sciece Rehovot, Israel. oded.goldreich@weizma.ac.il November 24, 2008 Preface.
More informationAn Introduction to Randomized Algorithms
A Itroductio to Radomized Algorithms The focus of this lecture is to study a radomized algorithm for quick sort, aalyze it usig probabilistic recurrece relatios, ad also provide more geeral tools for aalysis
More information6.3 Testing Series With Positive Terms
6.3. TESTING SERIES WITH POSITIVE TERMS 307 6.3 Testig Series With Positive Terms 6.3. Review of what is kow up to ow I theory, testig a series a i for covergece amouts to fidig the i= sequece of partial
More informationChapter 6 Sampling Distributions
Chapter 6 Samplig Distributios 1 I most experimets, we have more tha oe measuremet for ay give variable, each measuremet beig associated with oe radomly selected a member of a populatio. Hece we eed to
More informationConfidence Interval for Standard Deviation of Normal Distribution with Known Coefficients of Variation
Cofidece Iterval for tadard Deviatio of Normal Distributio with Kow Coefficiets of Variatio uparat Niwitpog Departmet of Applied tatistics, Faculty of Applied ciece Kig Mogkut s Uiversity of Techology
More informationLinear Regression Demystified
Liear Regressio Demystified Liear regressio is a importat subject i statistics. I elemetary statistics courses, formulae related to liear regressio are ofte stated without derivatio. This ote iteds to
More informationProblem Set 2 Solutions
CS271 Radomess & Computatio, Sprig 2018 Problem Set 2 Solutios Poit totals are i the margi; the maximum total umber of poits was 52. 1. Probabilistic method for domiatig sets 6pts Pick a radom subset S
More informationDiscrete Mathematics and Probability Theory Spring 2016 Rao and Walrand Note 19
CS 70 Discrete Mathematics ad Probability Theory Sprig 2016 Rao ad Walrad Note 19 Some Importat Distributios Recall our basic probabilistic experimet of tossig a biased coi times. This is a very simple
More informationDiscrete Mathematics for CS Spring 2007 Luca Trevisan Lecture 22
CS 70 Discrete Mathematics for CS Sprig 2007 Luca Trevisa Lecture 22 Aother Importat Distributio The Geometric Distributio Questio: A biased coi with Heads probability p is tossed repeatedly util the first
More informationNEW FAST CONVERGENT SEQUENCES OF EULER-MASCHERONI TYPE
UPB Sci Bull, Series A, Vol 79, Iss, 207 ISSN 22-7027 NEW FAST CONVERGENT SEQUENCES OF EULER-MASCHERONI TYPE Gabriel Bercu We itroduce two ew sequeces of Euler-Mascheroi type which have fast covergece
More informationSummary: CORRELATION & LINEAR REGRESSION. GC. Students are advised to refer to lecture notes for the GC operations to obtain scatter diagram.
Key Cocepts: 1) Sketchig of scatter diagram The scatter diagram of bivariate (i.e. cotaiig two variables) data ca be easily obtaied usig GC. Studets are advised to refer to lecture otes for the GC operatios
More informationACCESS TO SCIENCE, ENGINEERING AND AGRICULTURE: MATHEMATICS 1 MATH00030 SEMESTER / Statistics
ACCESS TO SCIENCE, ENGINEERING AND AGRICULTURE: MATHEMATICS 1 MATH00030 SEMESTER 1 018/019 DR. ANTHONY BROWN 8. Statistics 8.1. Measures of Cetre: Mea, Media ad Mode. If we have a series of umbers the
More information6 Integers Modulo n. integer k can be written as k = qn + r, with q,r, 0 r b. So any integer.
6 Itegers Modulo I Example 2.3(e), we have defied the cogruece of two itegers a,b with respect to a modulus. Let us recall that a b (mod ) meas a b. We have proved that cogruece is a equivalece relatio
More informationCS284A: Representations and Algorithms in Molecular Biology
CS284A: Represetatios ad Algorithms i Molecular Biology Scribe Notes o Lectures 3 & 4: Motif Discovery via Eumeratio & Motif Represetatio Usig Positio Weight Matrix Joshua Gervi Based o presetatios by
More information10-701/ Machine Learning Mid-term Exam Solution
0-70/5-78 Machie Learig Mid-term Exam Solutio Your Name: Your Adrew ID: True or False (Give oe setece explaatio) (20%). (F) For a cotiuous radom variable x ad its probability distributio fuctio p(x), it
More informationOn Random Line Segments in the Unit Square
O Radom Lie Segmets i the Uit Square Thomas A. Courtade Departmet of Electrical Egieerig Uiversity of Califoria Los Ageles, Califoria 90095 Email: tacourta@ee.ucla.edu I. INTRODUCTION Let Q = [0, 1] [0,
More informationThis is an introductory course in Analysis of Variance and Design of Experiments.
1 Notes for M 384E, Wedesday, Jauary 21, 2009 (Please ote: I will ot pass out hard-copy class otes i future classes. If there are writte class otes, they will be posted o the web by the ight before class
More informationAxis Aligned Ellipsoid
Machie Learig for Data Sciece CS 4786) Lecture 6,7 & 8: Ellipsoidal Clusterig, Gaussia Mixture Models ad Geeral Mixture Models The text i black outlies high level ideas. The text i blue provides simple
More informationChapter 6 Principles of Data Reduction
Chapter 6 for BST 695: Special Topics i Statistical Theory. Kui Zhag, 0 Chapter 6 Priciples of Data Reductio Sectio 6. Itroductio Goal: To summarize or reduce the data X, X,, X to get iformatio about a
More informationAccess to the published version may require journal subscription. Published with permission from: Elsevier.
This is a author produced versio of a paper published i Statistics ad Probability Letters. This paper has bee peer-reviewed, it does ot iclude the joural pagiatio. Citatio for the published paper: Forkma,
More informationSRC Technical Note June 17, Tight Thresholds for The Pure Literal Rule. Michael Mitzenmacher. d i g i t a l
SRC Techical Note 1997-011 Jue 17, 1997 Tight Thresholds for The Pure Literal Rule Michael Mitzemacher d i g i t a l Systems Research Ceter 130 Lytto Aveue Palo Alto, Califoria 94301 http://www.research.digital.com/src/
More information1 6 = 1 6 = + Factorials and Euler s Gamma function
Royal Holloway Uiversity of Lodo Departmet of Physics Factorials ad Euler s Gamma fuctio Itroductio The is a self-cotaied part of the course dealig, essetially, with the factorial fuctio ad its geeralizatio
More informationProperties and Hypothesis Testing
Chapter 3 Properties ad Hypothesis Testig 3.1 Types of data The regressio techiques developed i previous chapters ca be applied to three differet kids of data. 1. Cross-sectioal data. 2. Time series data.
More informationNewton s Laws: What is Their Operational Meaning?
Newto s Laws: What is Their Operatioal Meaig? Olga Kosheleva ad Vladik Kreiovich Uiversity of Texas at El Paso 500 W. Uiversity El Paso, TX 79968, USA olgak@utep.edu, vladik@utep.edu Abstract Newto s mechaics
More informationCHAPTER 8 FUNDAMENTAL SAMPLING DISTRIBUTIONS AND DATA DESCRIPTIONS. 8.1 Random Sampling. 8.2 Some Important Statistics
CHAPTER 8 FUNDAMENTAL SAMPLING DISTRIBUTIONS AND DATA DESCRIPTIONS 8.1 Radom Samplig The basic idea of the statistical iferece is that we are allowed to draw ifereces or coclusios about a populatio based
More informationMath 155 (Lecture 3)
Math 55 (Lecture 3) September 8, I this lecture, we ll cosider the aswer to oe of the most basic coutig problems i combiatorics Questio How may ways are there to choose a -elemet subset of the set {,,,
More informationResampling Methods. X (1/2), i.e., Pr (X i m) = 1/2. We order the data: X (1) X (2) X (n). Define the sample median: ( n.
Jauary 1, 2019 Resamplig Methods Motivatio We have so may estimators with the property θ θ d N 0, σ 2 We ca also write θ a N θ, σ 2 /, where a meas approximately distributed as Oce we have a cosistet estimator
More informationx 2 x x x x x + x x +2 x
Math 5440: Notes o particle radom walk Aaro Fogelso September 6, 005 Derivatio of the diusio equatio: Imagie that there is a distributio of particles spread alog the x-axis ad that the particles udergo
More informationChapter 13, Part A Analysis of Variance and Experimental Design
Slides Prepared by JOHN S. LOUCKS St. Edward s Uiversity Slide 1 Chapter 13, Part A Aalysis of Variace ad Eperimetal Desig Itroductio to Aalysis of Variace Aalysis of Variace: Testig for the Equality of
More informationLecture 2: Monte Carlo Simulation
STAT/Q SCI 43: Itroductio to Resamplig ethods Sprig 27 Istructor: Ye-Chi Che Lecture 2: ote Carlo Simulatio 2 ote Carlo Itegratio Assume we wat to evaluate the followig itegratio: e x3 dx What ca we do?
More informationApproximate Confidence Interval for the Reciprocal of a Normal Mean with a Known Coefficient of Variation
Metodološki zvezki, Vol. 13, No., 016, 117-130 Approximate Cofidece Iterval for the Reciprocal of a Normal Mea with a Kow Coefficiet of Variatio Wararit Paichkitkosolkul 1 Abstract A approximate cofidece
More informationLecture 6 Chi Square Distribution (χ 2 ) and Least Squares Fitting
Lecture 6 Chi Square Distributio (χ ) ad Least Squares Fittig Chi Square Distributio (χ ) Suppose: We have a set of measuremets {x 1, x, x }. We kow the true value of each x i (x t1, x t, x t ). We would
More informationRotationally invariant integrals of arbitrary dimensions
September 1, 14 Rotatioally ivariat itegrals of arbitrary dimesios James D. Wells Physics Departmet, Uiversity of Michiga, A Arbor Abstract: I this ote itegrals over spherical volumes with rotatioally
More informationFrequentist Inference
Frequetist Iferece The topics of the ext three sectios are useful applicatios of the Cetral Limit Theorem. Without kowig aythig about the uderlyig distributio of a sequece of radom variables {X i }, for
More informationChapter 3. Strong convergence. 3.1 Definition of almost sure convergence
Chapter 3 Strog covergece As poited out i the Chapter 2, there are multiple ways to defie the otio of covergece of a sequece of radom variables. That chapter defied covergece i probability, covergece i
More informationCHAPTER 2. Mean This is the usual arithmetic mean or average and is equal to the sum of the measurements divided by number of measurements.
CHAPTER 2 umerical Measures Graphical method may ot always be sufficiet for describig data. You ca use the data to calculate a set of umbers that will covey a good metal picture of the frequecy distributio.
More information1 Review of Probability & Statistics
1 Review of Probability & Statistics a. I a group of 000 people, it has bee reported that there are: 61 smokers 670 over 5 960 people who imbibe (drik alcohol) 86 smokers who imbibe 90 imbibers over 5
More informationIP Reference guide for integer programming formulations.
IP Referece guide for iteger programmig formulatios. by James B. Orli for 15.053 ad 15.058 This documet is iteded as a compact (or relatively compact) guide to the formulatio of iteger programs. For more
More informationSection 4.3. Boolean functions
Sectio 4.3. Boolea fuctios Let us take aother look at the simplest o-trivial Boolea algebra, ({0}), the power-set algebra based o a oe-elemet set, chose here as {0}. This has two elemets, the empty set,
More informationMath 10A final exam, December 16, 2016
Please put away all books, calculators, cell phoes ad other devices. You may cosult a sigle two-sided sheet of otes. Please write carefully ad clearly, USING WORDS (ot just symbols). Remember that the
More informationLecture 2 February 8, 2016
MIT 6.854/8.45: Advaced Algorithms Sprig 206 Prof. Akur Moitra Lecture 2 February 8, 206 Scribe: Calvi Huag, Lih V. Nguye I this lecture, we aalyze the problem of schedulig equal size tasks arrivig olie
More information(3) If you replace row i of A by its sum with a multiple of another row, then the determinant is unchanged! Expand across the i th row:
Math 5-4 Tue Feb 4 Cotiue with sectio 36 Determiats The effective way to compute determiats for larger-sized matrices without lots of zeroes is to ot use the defiitio, but rather to use the followig facts,
More informationDouble Stage Shrinkage Estimator of Two Parameters. Generalized Exponential Distribution
Iteratioal Mathematical Forum, Vol., 3, o. 3, 3-53 HIKARI Ltd, www.m-hikari.com http://dx.doi.org/.9/imf.3.335 Double Stage Shrikage Estimator of Two Parameters Geeralized Expoetial Distributio Alaa M.
More informationt distribution [34] : used to test a mean against an hypothesized value (H 0 : µ = µ 0 ) or the difference
EXST30 Backgroud material Page From the textbook The Statistical Sleuth Mea [0]: I your text the word mea deotes a populatio mea (µ) while the work average deotes a sample average ( ). Variace [0]: The
More informationTHE ASYMPTOTIC COMPLEXITY OF MATRIX REDUCTION OVER FINITE FIELDS
THE ASYMPTOTIC COMPLEXITY OF MATRIX REDUCTION OVER FINITE FIELDS DEMETRES CHRISTOFIDES Abstract. Cosider a ivertible matrix over some field. The Gauss-Jorda elimiatio reduces this matrix to the idetity
More informationElement sampling: Part 2
Chapter 4 Elemet samplig: Part 2 4.1 Itroductio We ow cosider uequal probability samplig desigs which is very popular i practice. I the uequal probability samplig, we ca improve the efficiecy of the resultig
More informationLecture 6 Chi Square Distribution (χ 2 ) and Least Squares Fitting
Lecture 6 Chi Square Distributio (χ ) ad Least Squares Fittig Chi Square Distributio (χ ) Suppose: We have a set of measuremets {x 1, x, x }. We kow the true value of each x i (x t1, x t, x t ). We would
More informationCHAPTER I: Vector Spaces
CHAPTER I: Vector Spaces Sectio 1: Itroductio ad Examples This first chapter is largely a review of topics you probably saw i your liear algebra course. So why cover it? (1) Not everyoe remembers everythig
More informationUnderstanding Samples
1 Will Moroe CS 109 Samplig ad Bootstrappig Lecture Notes #17 August 2, 2017 Based o a hadout by Chris Piech I this chapter we are goig to talk about statistics calculated o samples from a populatio. We
More informationMA131 - Analysis 1. Workbook 9 Series III
MA3 - Aalysis Workbook 9 Series III Autum 004 Cotets 4.4 Series with Positive ad Negative Terms.............. 4.5 Alteratig Series.......................... 4.6 Geeral Series.............................
More informationThe Binomial Theorem
The Biomial Theorem Robert Marti Itroductio The Biomial Theorem is used to expad biomials, that is, brackets cosistig of two distict terms The formula for the Biomial Theorem is as follows: (a + b ( k
More information1 of 7 7/16/2009 6:06 AM Virtual Laboratories > 6. Radom Samples > 1 2 3 4 5 6 7 6. Order Statistics Defiitios Suppose agai that we have a basic radom experimet, ad that X is a real-valued radom variable
More informationConvergence of random variables. (telegram style notes) P.J.C. Spreij
Covergece of radom variables (telegram style otes).j.c. Spreij this versio: September 6, 2005 Itroductio As we kow, radom variables are by defiitio measurable fuctios o some uderlyig measurable space
More informationSequences of Definite Integrals, Factorials and Double Factorials
47 6 Joural of Iteger Sequeces, Vol. 8 (5), Article 5.4.6 Sequeces of Defiite Itegrals, Factorials ad Double Factorials Thierry Daa-Picard Departmet of Applied Mathematics Jerusalem College of Techology
More informationCSE 527, Additional notes on MLE & EM
CSE 57 Lecture Notes: MLE & EM CSE 57, Additioal otes o MLE & EM Based o earlier otes by C. Grat & M. Narasimha Itroductio Last lecture we bega a examiatio of model based clusterig. This lecture will be
More informationDiscrete Mathematics for CS Spring 2008 David Wagner Note 22
CS 70 Discrete Mathematics for CS Sprig 2008 David Wager Note 22 I.I.D. Radom Variables Estimatig the bias of a coi Questio: We wat to estimate the proportio p of Democrats i the US populatio, by takig
More informationMASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.436J/15.085J Fall 2008 Lecture 19 11/17/2008 LAWS OF LARGE NUMBERS II THE STRONG LAW OF LARGE NUMBERS
MASSACHUSTTS INSTITUT OF TCHNOLOGY 6.436J/5.085J Fall 2008 Lecture 9 /7/2008 LAWS OF LARG NUMBRS II Cotets. The strog law of large umbers 2. The Cheroff boud TH STRONG LAW OF LARG NUMBRS While the weak
More information1 Inferential Methods for Correlation and Regression Analysis
1 Iferetial Methods for Correlatio ad Regressio Aalysis I the chapter o Correlatio ad Regressio Aalysis tools for describig bivariate cotiuous data were itroduced. The sample Pearso Correlatio Coefficiet
More informationSequences. Notation. Convergence of a Sequence
Sequeces A sequece is essetially just a list. Defiitio (Sequece of Real Numbers). A sequece of real umbers is a fuctio Z (, ) R for some real umber. Do t let the descriptio of the domai cofuse you; it
More informationConfidence intervals summary Conservative and approximate confidence intervals for a binomial p Examples. MATH1005 Statistics. Lecture 24. M.
MATH1005 Statistics Lecture 24 M. Stewart School of Mathematics ad Statistics Uiversity of Sydey Outlie Cofidece itervals summary Coservative ad approximate cofidece itervals for a biomial p The aïve iterval
More informationAdvanced Stochastic Processes.
Advaced Stochastic Processes. David Gamarik LECTURE 2 Radom variables ad measurable fuctios. Strog Law of Large Numbers (SLLN). Scary stuff cotiued... Outlie of Lecture Radom variables ad measurable fuctios.
More information4. Partial Sums and the Central Limit Theorem
1 of 10 7/16/2009 6:05 AM Virtual Laboratories > 6. Radom Samples > 1 2 3 4 5 6 7 4. Partial Sums ad the Cetral Limit Theorem The cetral limit theorem ad the law of large umbers are the two fudametal theorems
More informationBernoulli numbers and the Euler-Maclaurin summation formula
Physics 6A Witer 006 Beroulli umbers ad the Euler-Maclauri summatio formula I this ote, I shall motivate the origi of the Euler-Maclauri summatio formula. I will also explai why the coefficiets o the right
More informationw (1) ˆx w (1) x (1) /ρ and w (2) ˆx w (2) x (2) /ρ.
2 5. Weighted umber of late jobs 5.1. Release dates ad due dates: maximimizig the weight of o-time jobs Oce we add release dates, miimizig the umber of late jobs becomes a sigificatly harder problem. For
More informationLecture 15: Learning Theory: Concentration Inequalities
STAT 425: Itroductio to Noparametric Statistics Witer 208 Lecture 5: Learig Theory: Cocetratio Iequalities Istructor: Ye-Chi Che 5. Itroductio Recall that i the lecture o classificatio, we have see that
More informationTopic 5: Basics of Probability
Topic 5: Jue 1, 2011 1 Itroductio Mathematical structures lie Euclidea geometry or algebraic fields are defied by a set of axioms. Mathematical reality is the developed through the itroductio of cocepts
More informationGoodness-of-Fit Tests and Categorical Data Analysis (Devore Chapter Fourteen)
Goodess-of-Fit Tests ad Categorical Data Aalysis (Devore Chapter Fourtee) MATH-252-01: Probability ad Statistics II Sprig 2019 Cotets 1 Chi-Squared Tests with Kow Probabilities 1 1.1 Chi-Squared Testig................
More informationAnna Janicka Mathematical Statistics 2018/2019 Lecture 1, Parts 1 & 2
Aa Jaicka Mathematical Statistics 18/19 Lecture 1, Parts 1 & 1. Descriptive Statistics By the term descriptive statistics we will mea the tools used for quatitative descriptio of the properties of a sample
More informationSeunghee Ye Ma 8: Week 5 Oct 28
Week 5 Summary I Sectio, we go over the Mea Value Theorem ad its applicatios. I Sectio 2, we will recap what we have covered so far this term. Topics Page Mea Value Theorem. Applicatios of the Mea Value
More informationMedian and IQR The median is the value which divides the ordered data values in half.
STA 666 Fall 2007 Web-based Course Notes 4: Describig Distributios Numerically Numerical summaries for quatitative variables media ad iterquartile rage (IQR) 5-umber summary mea ad stadard deviatio Media
More informationECE 901 Lecture 12: Complexity Regularization and the Squared Loss
ECE 90 Lecture : Complexity Regularizatio ad the Squared Loss R. Nowak 5/7/009 I the previous lectures we made use of the Cheroff/Hoeffdig bouds for our aalysis of classifier errors. Hoeffdig s iequality
More informationADVANCED SOFTWARE ENGINEERING
ADVANCED SOFTWARE ENGINEERING COMP 3705 Exercise Usage-based Testig ad Reliability Versio 1.0-040406 Departmet of Computer Ssciece Sada Narayaappa, Aeliese Adrews Versio 1.1-050405 Departmet of Commuicatio
More informationBounds for the Positive nth-root of Positive Integers
Pure Mathematical Scieces, Vol. 6, 07, o., 47-59 HIKARI Ltd, www.m-hikari.com https://doi.org/0.988/pms.07.7 Bouds for the Positive th-root of Positive Itegers Rachid Marsli Mathematics ad Statistics Departmet
More informationThe Riemann Zeta Function
Physics 6A Witer 6 The Riema Zeta Fuctio I this ote, I will sketch some of the mai properties of the Riema zeta fuctio, ζ(x). For x >, we defie ζ(x) =, x >. () x = For x, this sum diverges. However, we
More informationProbability and statistics: basic terms
Probability ad statistics: basic terms M. Veeraraghava August 203 A radom variable is a rule that assigs a umerical value to each possible outcome of a experimet. Outcomes of a experimet form the sample
More informationNICK DUFRESNE. 1 1 p(x). To determine some formulas for the generating function of the Schröder numbers, r(x) = a(x) =
AN INTRODUCTION TO SCHRÖDER AND UNKNOWN NUMBERS NICK DUFRESNE Abstract. I this article we will itroduce two types of lattice paths, Schröder paths ad Ukow paths. We will examie differet properties of each,
More informationA statistical method to determine sample size to estimate characteristic value of soil parameters
A statistical method to determie sample size to estimate characteristic value of soil parameters Y. Hojo, B. Setiawa 2 ad M. Suzuki 3 Abstract Sample size is a importat factor to be cosidered i determiig
More information