A Simple Probabilistic Explanation of Term Frequency-Inverse Document Frequency (tf-idf) Heuristic (and Variations Motivated by This Explanation)

Size: px
Start display at page:

Download "A Simple Probabilistic Explanation of Term Frequency-Inverse Document Frequency (tf-idf) Heuristic (and Variations Motivated by This Explanation)"

Transcription

1 Uiversity of Texas at El Paso Departmetal Techical Reports (CS) Departmet of Computer Sciece A Simple Probabilistic Explaatio of Term Frequecy-Iverse Documet Frequecy (tf-idf) Heuristic (ad Variatios Motivated by This Explaatio) Lukas Havrlat Palacky Uiversity Olomouc, lukashavrlat@upolcz Vladik Kreiovich Uiversity of Texas at El Paso, vladik@utepedu Follow this ad additioal works at: Part of the Computer Scieces Commos Commets: Techical Report: UTEP-CS Recommeded Citatio Havrlat, Lukas ad Kreiovich, Vladik, "A Simple Probabilistic Explaatio of Term Frequecy-Iverse Documet Frequecy (tfidf) Heuristic (ad Variatios Motivated by This Explaatio)" (2014) Departmetal Techical Reports (CS) Paper This Article is brought to you for free ad ope access by the Departmet of Computer Sciece at DigitalCommos@UTEP It has bee accepted for iclusio i Departmetal Techical Reports (CS) by a authorized admiistrator of DigitalCommos@UTEP For more iformatio, please cotact lweber@utepedu

2 To appear i the Iteratioal Joural of Geeral Systems Vol 00, o 00, Moth 20XX, 1 9 A Simple Probabilistic Explaatio of Term Frequecy-Iverse Documet Frequecy (tf-idf) Heuristic (ad Variatios Motivated by This Explaatio) Lukáš Havrlat a ad Vladik Kreiovich b a Departmet of Computer Sciece, Palacky Uiversity Olomouc, 17 listopadu 12, CZ Olomouc, Czech Republic, lukashavrlat@upolcz; b Departmet of Computer Sciece, Uiversity of Texas at El Paso, 500 W Uiversity, El Paso, TX 79968, USA (Received 23 May 2014; accepted XX moth 20XX) I documet aalysis, a importat task is to automatically fid keywords which best describe the subject of the documet Oe of the most widely used techiques for keyword detectio is a techique based o the term frequecy-iverse documet frequecy (tf-idf) heuristic This techiques has some explaatios, but these explaatios are somewhat too complex to be fully covicig I this paper, we provide a simple probabilistic explaatio for the tf-idf heuristic We also show that the ideas behid explaatio ca help us come up with more complex formulas which will hopefully lead to a more adequate detectio of keywords Keywords: keywords; term frequecy-iverse documet frequecy (tf-idf); probabilistic explaatio 1 tf-idf: A Brief Remider ad Formulatio of the Problem How to fid keywords: qualitative idea Give a documet, how ca we idetify its keywords? I may cases, this is easy: eg, if we have a text which metios Turig machies may times, the clearly the term Turig machie should be selected as a keyword This does ot mea, of course, that every word which occurs may times i a documet is a meaigful keyword eg, words like a or the occurs may times i a documet, but we should select them as keywords characterizig a give documet sice they occur may times i every documet So, to idetify keywords, it is ecessary to take ito accout ot oly how may times a give word t occurs i a give documet d, but also how frequetly the word t occurs i other documets How to fid keywords: a (semi-empirical) algorithm Iformatio retrieval ad text miig use a special term frequecy-iverse documet frequecy (tf-idf) algorithm for automatic detectio of keywords; see, eg, Maig, Raghava, ad Schütze (2008); Rajarama ad Ullma (2011) The ideas behid this algorithm were first proposed i Joes (1972) This algorithm uses the followig three umerical characteristics: Correspodig author latexhelpdesk@tadfcouk 1

3 the umber of time tf(t, d) that the word t occurs i a documet d; this umber is kow as term frequecy; ad the umber total umber of documets i a give corpus; ad the umber of documets df(t) which cotai the give term t; this umber is kow as the documet frequecy Based o the last two characteristics, the algorithm computes the quatity idf(t) def = l kow as the iverse documet frequecy df(t) As keywords characterizig a give documet d, we the select words t with the largest value of the product tf-idf(t, d, D) def = tf(t, d) idf(t, D) Remaiig challege The tf-idf algorithm works reasoably well: i may cases, it leads to a adequate selectio of keywords What is ot clear is why the above formula is so successful while other similar formulas (see, eg, Maig, Raghava, ad Schütze (2008)) are ot so successful There have bee several attempts to provide a theoretical explaatio for the success of tf-idf Heimstra (2000); Joes (1972); Maig, Raghava, ad Schütze (2008); Robertso (2004), but the resultig explaatios are somewhat overcomplicated ad ot very covicig What we do i this paper I this paper, we provide a simple probabilistic explaatio for the tf-idf heuristic This explaatio motives some modificatios of the origial tf-idf formulas; we hope that these modificatios will be useful too 2 A Simple Probabilistic Explaatio of tf-idf Simplified model: mai idea Let us deote the total umber of occurreces of the word t i the whole corpus D by tf(t, D) Let us cosider a simplified model i which each of these occurreces is radomly assiged (with equal probability) to oe of the documets from the corpus D (ad differet occurreces are assiged idepedetly from oe aother) I this model, the probability that each occurrece of the word t is assiged to a give documet is equal to 1 Let us estimate the probability that this simplified model leads to the give umber of occurreces Let us estimate the probability p that after radomly (ad idepedetly) assigig all def = tf(t, D) occurreces, the documet d will cotai k def = tf(t, d) occurreces The smaller this probability, the less probable it is that the text got k occurreces radomly, ad thus, the more cofidet we are that the word t is importat for the give documet ie, that t is oe of d s keywords Aalysis of the simplified model ad the resultig formula for the desired probability Let us start with the case k = 1, whe the documet cotais exactly oe occurrece of the word t To compute this probability, let us first estimate the probability that the assigmet of the first of words t placed this word ito the give documet d, ad all the other 1 assigmets placed the correspodig word i other documets The probability that, out of documets, the first assigmet is placed ito the documet d, is equal to 1 ; the probability that each of the ext 1 assigmets is placed i oe of other 1 documets is equal to 2

4 1 = 1 1 Sice the assigmets of differet occurreces are idepedet, the resultig probability of this situatio is ( ) 1 The overall probability that k = 1 comes from such icompatible evets: the evet that the first occurrece laded up i d, the evet that the secod occurrece laded up i d, etc Thus, the overall probability p that k = 1 is equal to the sum of such terms, ie, to p = 1 ( 1 1 ) 1 For k = 2, we ca similarly compute the correspodig probability p: for each pair of occurreces, the probability that these two occurreces were placed i d ad all 2 others were placed i other 1 documets is equal to ( ) 1 2 ( 1 1 ) 2 Thus, the ( probability ) p ca be obtaied by multiplyig this probability by the total umber of such pairs: 2 p = ( ) ( ) 1 2 ( 1 1 ) 2 2 Similarly, for a geeral k, for each k-tuple of occurreces, the probability that these k occurreces were placed i d ad all k others were placed i other 1 documets is equal to ( ) 1 k ( 1 1 ) k Thus, the ( probability ) p ca be obtaied by multiplyig this probability by the total umber of such tuples: k p = ( ) ( ) 1 k ( 1 1 ) k k Aalysis of the problem ad the resultig iequalities betwee k,, ad We are iterested i the cases whe the total umber of occurreces of the word t is much smaller tha the total umber of documets i the corpus: Ideed, if is comparable with as is the case of such words as a, the, etc this meas that the word t occurs i a large portio of documets ad is, therefore, ot typical for the give documet d so it caot serve as oe of its keywords We are also iterested i the cases whe the total umber k of the occurreces of the term t i the give documet is much smaller tha its total umber of 3

5 occurreces i the whole corpus of documets Let us explai this requiremet Of course, by defiitio, k is always smaller tha or equal to If k is of the same order as, this meas that o average, there are very few documets that cotai this term This ca happe, for example, if a author itroduced a ew techical term i oe paper ad uses this term i aother paper However, i this case, it is ot a good idea to use this ew term as a keyword: oe of the mai purposes of the keyword is to make it to clear to the reader what this paper is about From this viewpoit, usig, as a keyword, a term which o oe uses ad thus, most probably, o oe uderstads makes o sese Thus, keywords are meaigful oly if k Fially, for a word t to be a reasoable keyword for a documet d, it should appear several times i the documet: 1 k Summarizig, whe we look for meaigful keywords, we should limit ourselves to cases whe 1 k The above iequalities help simplify the expressio for the probability Let us show how the above iequalities allow us to simplify the above expressio for the probability p Specifically, the above expressio represets the probability p ( ) ( ) 1 k ( as the product of three factors:,, ad 1 1 ) k ; we will show that k the first ad the third factors ca simplified First, by usig the expasio of the fuctio (1 x) k i Taylor series, we get ( 1 1 ) k = 1 ( k) 1 + Sice, we have k, hece ( k) approximatio, ( 1 1 ) k ad so, i the first To simplify a expressio for of combiatios: ( ) = k This ca be equivaletly described as ( ), let us use a explicit expressio for the umber k ( 1) ( 2) ( k) 1 2 k ( ) ( = k k k! 1 1 ) ( 1 2 ) ( 1 k ) Sice k, we similarly get 1 1 1,, 1 k 1 ad therefore, ( ) k k k! 4

6 Sice k 1, we ca use Stirlig formula (see, eg, Abramowitz ad Stegu (2002)) for the factorial k! ( k e ) k, so ( ) k e k k k k Substitutig these two approximate expressios for the factors ito the formula for the probability p, we get a approximate formula p k e k ( ) 1 k k k From probabilities to their logarithms The correspodig computatios ca be further simplified if we use logarithms of the probabilities istead of the probabilities themselves Sice logarithm is mootoic, the use of probabilities does ot chage which term is more probable ad which is less probable; however, sice l(a b) = l(a) + l(b) ad l(a k ) = k l(a), the use of logarithms replaces multiplicatio with a computatioally simpler additio operatio, ad raisig to the power with a computatioally simpler multiplicatio this is why logarithms were iveted i the first place Sice the probability p is smaller tha 1, its logarithm l(p) is egative; to make it more coveiet, let us cosider its opposite l(p) From the above approximate formula, we coclude that l(p) k + k l(k) k + k l() = k l ( ) + k (l(k) 1) Here, k 1, so l(k) 1 Thus, we arrive at the followig formula: Resultig formula l(p) k l ( ) + k l(k) The smaller the probability p, the larger this value ad therefore, the more probable it is that the word t is oe of the keywords describig the documet d Thus, as keywords describig a documet, we should select all the terms t for which this expressio is the largest Let us compare this formula with the tf-idf formula The tf-idf formula has the form ( ) k l, ñ where ñ is the umber of documets that cotai the term t Due to the kow Zipf s law, most documets cotai just oe occurrece of the term t; thus, the overall umber of occurreces of the term t is approximately equal to the umber ñ of the documets that cotai t: ñ Hece, l(p) k l ( ) + k l(k) ñ 5

7 Whe k, we have l(k) l ( Ñ l(p) k l ), ad therefore, ( ) ñ This is exactly the tf-idf formula that we wated to explai Thus, we ideed get a simple probabilistic justificatio of the tf-idf formula Beyod explaatio, towards a more accurate formula The above aalysis eables us ot oly to justify the existig semi-heuristic tf-idf formula, we ca also provide a ew formula which more accurately describes the probabilistic meaig ad which, we hope, will be eve more adequate i selectig keywords amely, istead of selectig keywords based o the tf-idf product expressio, we should select keywords based o the value tf(t, d) idf(t, D) + tf(t, d) l(tf(t, d)), where the ew measure of iverse documet frequecy is defied as ( ) idf(t) def = l, df(t) ad the ew documet frequecy df(t) is defied as the total umber of occurreces of the term t i the whole corpus of documets Commet We ca get a eve more accurate descriptio of the probability if we cosider a more realistic (ad, thus, more complex) probabilistic model 3 A More Realistic Probabilistic Model ad the Resultig Modificatio of tf-idf Towards a more accurate model I the above simplified model, we treated all the documets i the corpus equally I practice, some documets are loger ad some are shorter Clearly, if a documet is loger, it has a higher probability to cotai several occurreces of the give term t Let us show how we ca take this fact ito accout Resultig model We wat to take ito accout that differet documets have differet umber of words Let us deote the total umber of words i a documet d by w(d); we will call this umber the legth of the documet d Let W be the total umber of words i all the documets i the give corpus Out of these W words, we have ñ = df(t) occurreces of each word t Thus, the probability p(t) that a radomly selected word is the occurrece of the word t is equal to the ratio p(t) = df(t) The correspodig probabilistic model is W straightforward: ito each of W word locatios, we place a term t with probability p(t), ad assigmets correspodig to differet locatios are idepedet How probable it is that, as a result of this radom assigmet, i a documet d with w(d) words, we will get tf(t, d) words? The lower the probability of this result, the more probable it is that the word t is oe of the keywords of the documet d 6

8 Aalysis of the probabilistic model After the above-described radom assigmet, the resultig umber of occurreces of t ca be computed as the sum tf(t, d) = x x w(d), where: x i = 1 if the word at the i-th locatio is t ad x i = 0 if the word at the i-th locatio is differet from t Sice assigmets correspodig to differet locatios are idepedet ad idetically distributed, the value tf(t, d) is thus equal to the sum of w(d) idepedet idetically distributed radom variables A documet usually cotais a reasoably large umber of words; so, to describe the probability distributio of the value tf(t, d), we ca use the Cetral Limit Theorem, accordig to which the probability distributio of the sum of may idepedet idetically distributed radom variables is close to Gaussia (ormal); see, eg, Sheski (2011) A ormal distributio is uiquely determied by its mea µ ad its variace V = σ 2 Whe we add idepedet radom variables, their meas add ad their variaces add Thus, for the sum of w(d) idepedet idetically distributed radom variables x i, we get: µ = w(d) µ i, where µ i is the mea of each of the variables x i, ad V = w(d) V i, where V i is the mea of the variable x i Each variable x i has two possible values v j : Thus, the value v 1 = 1 with probability p 1 = p(t), ad the value v 0 = 0 with the remaiig probability p 0 = 1 p(t) µ i = j p j v j = p(t) 1 + (1 p(t)) 0 = p(t) Similarly, V i = j p j (v j µ i ) 2 = p(t) (1 p(t)) 2 +0 (0 p(t)) 2 = p(t) (1 p(t)) 2 +(1 p(t) p(t) 2 The two terms i the right-had side have a commo factor, so Thus, V i = p(t) (1 p(t)) ((1 p(t)) + p(t)) = p(t) (1 p(t)) µ = w(d) µ i = w(d) p(t); V = w(d) p(t) (1 p(t)) For ormal distributio, possible values are values withi the iterval [µ k 0 σ, µ + k 0 σ], where k 0 is usually 2, 3, or 6 The larger k 0, the less probable it is for the correspodig value to appear For a give value x, the correspodig value k 0 is determied by the equality µ ± k 0 σ = x, so k 0 = ± x µ σ, ad k 0 = x µ σ 7

9 For x = tf(t, d), the resultig ratio is equal to tf(t, d) w(d) p(t) w(d) p(t) (1 p(t)) Let us simplify this formula A keyword should occur much more frequetly i this documet tha it occurs i the corpus i geeral Thus, whe we look for tf(t, d) keywords, we are iterested oly i the words for which p(t) For such w(d) words, tf(t, d) w(d) p(t); thus, tf(t, d) w(d) p(t) tf(t, d) ad therefore, the above formula gets a simplified form tf(t, d) w(d) p(t) (1 p(t)) Also, as we have discussed earlier, as meaigful keywords, we caot take words like a or the which occur frequetly i all the documets Thus, meaigful keywords should be relatively rare: we should have p(t) 1 For such words, 1 p(t) 1, ad we get a eve simpler formula for the resultig criterio: tf(t, d) w(d) p(t) Substitutig the expressio for p(t) ito this formula, we get the followig fial expressio Resultig formula As keywords correspodig to the documet d, we should select words t for which the followig value is the largest: W tf(t, d) df(t) w(d) Let us compare the ew formula with tf-idf expressio The tf-idf formula correspods to the case whe we igore the fact that differet documets have differet legths, ie, i effect, assume that all the documets have the same legth If w(d) = cost, the the ratio W is simply the total umber of the documets, w(d) ad the above formula takes the form tf(t, d) idf(t) (t) = tf(t, d) df This formula is very similar to tf-idf (to be more precise, it is similar to the modificatio of tf-idf that we described i the previous sectio); the mai differece is that, istead of the logarithm of the iverse documet frequecy, we take the square root 8

10 Ackowledgemets This work was supported i part by the atioal Sciece Foudatio grats HRD ad HRD (Cyber-ShARE Ceter of Excellece) ad DUE This work was doe whe L Havrlat was visitig the Uiversity of Texas at El Paso, a visit supported by the Palacky Uiversity Refereces Abramowitz, M, ad I Stegu 2002 Hadbook of Mathematical Fuctios, ew York: Dover Publicatios Heimstra, D 2000 A probabilistic justificatio for usig tf idf term weightig i iformatio retrieval, Iteratioal Joural of Digital Libraries 3: Joes, K S 1972 A statistical iterpretatio of term specificity ad its applicatio i retreival, Joural of Documetatio 28 (1): 11 21; reprited i 2004, 60 (5): C D Maig, P Raghava, ad H Schütze 2008 Itroductio to Iformatio Retrieval, ew York: Cambridge Uiversity Press A Rajarama ad J D Ullma 2011 Miig of Massive Datasets, Cambridge, UK: Cambridge Uiversity Press Robersto, S 2004 Uderstadig iverse documet frequecy: o theoretical argumets for idf, Joural of Documetatio 60 (5): Sheski, D J 2011 Hadbook of Parametric ad oparametric Statistical Procedures, Boca Rato, Florida: Chapma & Hall/CRC 9

Towards Decision Making under Interval, Set-Valued, Fuzzy, and Z-Number Uncertainty: A Fair Price Approach

Towards Decision Making under Interval, Set-Valued, Fuzzy, and Z-Number Uncertainty: A Fair Price Approach Towards Decisio Makig uder Iterval, Set-Valued, Fuzzy, ad Z-Number Ucertaity: A Fair Price Approach Joe Lorkowski, Rafik Aliev, ad Vladik Kreiovich Abstract I this paper, we explore oe of the possible

More information

DISTRIBUTION LAW Okunev I.V.

DISTRIBUTION LAW Okunev I.V. 1 DISTRIBUTION LAW Okuev I.V. Distributio law belogs to a umber of the most complicated theoretical laws of mathematics. But it is also a very importat practical law. Nothig ca help uderstad complicated

More information

Relations between the continuous and the discrete Lotka power function

Relations between the continuous and the discrete Lotka power function Relatios betwee the cotiuous ad the discrete Lotka power fuctio by L. Egghe Limburgs Uiversitair Cetrum (LUC), Uiversitaire Campus, B-3590 Diepebeek, Belgium ad Uiversiteit Atwerpe (UA), Campus Drie Eike,

More information

1 Introduction to reducing variance in Monte Carlo simulations

1 Introduction to reducing variance in Monte Carlo simulations Copyright c 010 by Karl Sigma 1 Itroductio to reducig variace i Mote Carlo simulatios 11 Review of cofidece itervals for estimatig a mea I statistics, we estimate a ukow mea µ = E(X) of a distributio by

More information

Lesson 10: Limits and Continuity

Lesson 10: Limits and Continuity www.scimsacademy.com Lesso 10: Limits ad Cotiuity SCIMS Academy 1 Limit of a fuctio The cocept of limit of a fuctio is cetral to all other cocepts i calculus (like cotiuity, derivative, defiite itegrals

More information

FACULTY OF MATHEMATICAL STUDIES MATHEMATICS FOR PART I ENGINEERING. Lectures

FACULTY OF MATHEMATICAL STUDIES MATHEMATICS FOR PART I ENGINEERING. Lectures FACULTY OF MATHEMATICAL STUDIES MATHEMATICS FOR PART I ENGINEERING Lectures MODULE 5 STATISTICS II. Mea ad stadard error of sample data. Biomial distributio. Normal distributio 4. Samplig 5. Cofidece itervals

More information

Topic 9: Sampling Distributions of Estimators

Topic 9: Sampling Distributions of Estimators Topic 9: Samplig Distributios of Estimators Course 003, 2016 Page 0 Samplig distributios of estimators Sice our estimators are statistics (particular fuctios of radom variables), their distributio ca be

More information

The standard deviation of the mean

The standard deviation of the mean Physics 6C Fall 20 The stadard deviatio of the mea These otes provide some clarificatio o the distictio betwee the stadard deviatio ad the stadard deviatio of the mea.. The sample mea ad variace Cosider

More information

Chapter 6 Part 5. Confidence Intervals t distribution chi square distribution. October 23, 2008

Chapter 6 Part 5. Confidence Intervals t distribution chi square distribution. October 23, 2008 Chapter 6 Part 5 Cofidece Itervals t distributio chi square distributio October 23, 2008 The will be o help sessio o Moday, October 27. Goal: To clearly uderstad the lik betwee probability ad cofidece

More information

Ma 530 Introduction to Power Series

Ma 530 Introduction to Power Series Ma 530 Itroductio to Power Series Please ote that there is material o power series at Visual Calculus. Some of this material was used as part of the presetatio of the topics that follow. What is a Power

More information

Statistics 511 Additional Materials

Statistics 511 Additional Materials Cofidece Itervals o mu Statistics 511 Additioal Materials This topic officially moves us from probability to statistics. We begi to discuss makig ifereces about the populatio. Oe way to differetiate probability

More information

The Random Walk For Dummies

The Random Walk For Dummies The Radom Walk For Dummies Richard A Mote Abstract We look at the priciples goverig the oe-dimesioal discrete radom walk First we review five basic cocepts of probability theory The we cosider the Beroulli

More information

Information-based Feature Selection

Information-based Feature Selection Iformatio-based Feature Selectio Farza Faria, Abbas Kazeroui, Afshi Babveyh Email: {faria,abbask,afshib}@staford.edu 1 Itroductio Feature selectio is a topic of great iterest i applicatios dealig with

More information

62. Power series Definition 16. (Power series) Given a sequence {c n }, the series. c n x n = c 0 + c 1 x + c 2 x 2 + c 3 x 3 +

62. Power series Definition 16. (Power series) Given a sequence {c n }, the series. c n x n = c 0 + c 1 x + c 2 x 2 + c 3 x 3 + 62. Power series Defiitio 16. (Power series) Give a sequece {c }, the series c x = c 0 + c 1 x + c 2 x 2 + c 3 x 3 + is called a power series i the variable x. The umbers c are called the coefficiets of

More information

Approximations and more PMFs and PDFs

Approximations and more PMFs and PDFs Approximatios ad more PMFs ad PDFs Saad Meimeh 1 Approximatio of biomial with Poisso Cosider the biomial distributio ( b(k,,p = p k (1 p k, k λ: k Assume that is large, ad p is small, but p λ at the limit.

More information

Topic 9: Sampling Distributions of Estimators

Topic 9: Sampling Distributions of Estimators Topic 9: Samplig Distributios of Estimators Course 003, 2018 Page 0 Samplig distributios of estimators Sice our estimators are statistics (particular fuctios of radom variables), their distributio ca be

More information

2 Geometric interpretation of complex numbers

2 Geometric interpretation of complex numbers 2 Geometric iterpretatio of complex umbers 2.1 Defiitio I will start fially with a precise defiitio, assumig that such mathematical object as vector space R 2 is well familiar to the studets. Recall that

More information

Across-the-Board Spending Cuts Are Very Inefficient: A Proof

Across-the-Board Spending Cuts Are Very Inefficient: A Proof Uiversity of Texas at El Paso DigitalCommos@UTEP Departmetal Techical Reports (CS) Departmet of Computer Sciece 6-2015 Across-the-Board Spedig Cuts Are Very Iefficiet: A Proof Vladik Kreiovich Uiversity

More information

Module 1 Fundamentals in statistics

Module 1 Fundamentals in statistics Normal Distributio Repeated observatios that differ because of experimetal error ofte vary about some cetral value i a roughly symmetrical distributio i which small deviatios occur much more frequetly

More information

Stat 421-SP2012 Interval Estimation Section

Stat 421-SP2012 Interval Estimation Section Stat 41-SP01 Iterval Estimatio Sectio 11.1-11. We ow uderstad (Chapter 10) how to fid poit estimators of a ukow parameter. o However, a poit estimate does ot provide ay iformatio about the ucertaity (possible

More information

Econ 325/327 Notes on Sample Mean, Sample Proportion, Central Limit Theorem, Chi-square Distribution, Student s t distribution 1.

Econ 325/327 Notes on Sample Mean, Sample Proportion, Central Limit Theorem, Chi-square Distribution, Student s t distribution 1. Eco 325/327 Notes o Sample Mea, Sample Proportio, Cetral Limit Theorem, Chi-square Distributio, Studet s t distributio 1 Sample Mea By Hiro Kasahara We cosider a radom sample from a populatio. Defiitio

More information

Topic 9: Sampling Distributions of Estimators

Topic 9: Sampling Distributions of Estimators Topic 9: Samplig Distributios of Estimators Course 003, 2018 Page 0 Samplig distributios of estimators Sice our estimators are statistics (particular fuctios of radom variables), their distributio ca be

More information

THE KALMAN FILTER RAUL ROJAS

THE KALMAN FILTER RAUL ROJAS THE KALMAN FILTER RAUL ROJAS Abstract. This paper provides a getle itroductio to the Kalma filter, a umerical method that ca be used for sesor fusio or for calculatio of trajectories. First, we cosider

More information

Basics of Probability Theory (for Theory of Computation courses)

Basics of Probability Theory (for Theory of Computation courses) Basics of Probability Theory (for Theory of Computatio courses) Oded Goldreich Departmet of Computer Sciece Weizma Istitute of Sciece Rehovot, Israel. oded.goldreich@weizma.ac.il November 24, 2008 Preface.

More information

An Introduction to Randomized Algorithms

An Introduction to Randomized Algorithms A Itroductio to Radomized Algorithms The focus of this lecture is to study a radomized algorithm for quick sort, aalyze it usig probabilistic recurrece relatios, ad also provide more geeral tools for aalysis

More information

6.3 Testing Series With Positive Terms

6.3 Testing Series With Positive Terms 6.3. TESTING SERIES WITH POSITIVE TERMS 307 6.3 Testig Series With Positive Terms 6.3. Review of what is kow up to ow I theory, testig a series a i for covergece amouts to fidig the i= sequece of partial

More information

Chapter 6 Sampling Distributions

Chapter 6 Sampling Distributions Chapter 6 Samplig Distributios 1 I most experimets, we have more tha oe measuremet for ay give variable, each measuremet beig associated with oe radomly selected a member of a populatio. Hece we eed to

More information

Confidence Interval for Standard Deviation of Normal Distribution with Known Coefficients of Variation

Confidence Interval for Standard Deviation of Normal Distribution with Known Coefficients of Variation Cofidece Iterval for tadard Deviatio of Normal Distributio with Kow Coefficiets of Variatio uparat Niwitpog Departmet of Applied tatistics, Faculty of Applied ciece Kig Mogkut s Uiversity of Techology

More information

Linear Regression Demystified

Linear Regression Demystified Liear Regressio Demystified Liear regressio is a importat subject i statistics. I elemetary statistics courses, formulae related to liear regressio are ofte stated without derivatio. This ote iteds to

More information

Problem Set 2 Solutions

Problem Set 2 Solutions CS271 Radomess & Computatio, Sprig 2018 Problem Set 2 Solutios Poit totals are i the margi; the maximum total umber of poits was 52. 1. Probabilistic method for domiatig sets 6pts Pick a radom subset S

More information

Discrete Mathematics and Probability Theory Spring 2016 Rao and Walrand Note 19

Discrete Mathematics and Probability Theory Spring 2016 Rao and Walrand Note 19 CS 70 Discrete Mathematics ad Probability Theory Sprig 2016 Rao ad Walrad Note 19 Some Importat Distributios Recall our basic probabilistic experimet of tossig a biased coi times. This is a very simple

More information

Discrete Mathematics for CS Spring 2007 Luca Trevisan Lecture 22

Discrete Mathematics for CS Spring 2007 Luca Trevisan Lecture 22 CS 70 Discrete Mathematics for CS Sprig 2007 Luca Trevisa Lecture 22 Aother Importat Distributio The Geometric Distributio Questio: A biased coi with Heads probability p is tossed repeatedly util the first

More information

NEW FAST CONVERGENT SEQUENCES OF EULER-MASCHERONI TYPE

NEW FAST CONVERGENT SEQUENCES OF EULER-MASCHERONI TYPE UPB Sci Bull, Series A, Vol 79, Iss, 207 ISSN 22-7027 NEW FAST CONVERGENT SEQUENCES OF EULER-MASCHERONI TYPE Gabriel Bercu We itroduce two ew sequeces of Euler-Mascheroi type which have fast covergece

More information

Summary: CORRELATION & LINEAR REGRESSION. GC. Students are advised to refer to lecture notes for the GC operations to obtain scatter diagram.

Summary: CORRELATION & LINEAR REGRESSION. GC. Students are advised to refer to lecture notes for the GC operations to obtain scatter diagram. Key Cocepts: 1) Sketchig of scatter diagram The scatter diagram of bivariate (i.e. cotaiig two variables) data ca be easily obtaied usig GC. Studets are advised to refer to lecture otes for the GC operatios

More information

ACCESS TO SCIENCE, ENGINEERING AND AGRICULTURE: MATHEMATICS 1 MATH00030 SEMESTER / Statistics

ACCESS TO SCIENCE, ENGINEERING AND AGRICULTURE: MATHEMATICS 1 MATH00030 SEMESTER / Statistics ACCESS TO SCIENCE, ENGINEERING AND AGRICULTURE: MATHEMATICS 1 MATH00030 SEMESTER 1 018/019 DR. ANTHONY BROWN 8. Statistics 8.1. Measures of Cetre: Mea, Media ad Mode. If we have a series of umbers the

More information

6 Integers Modulo n. integer k can be written as k = qn + r, with q,r, 0 r b. So any integer.

6 Integers Modulo n. integer k can be written as k = qn + r, with q,r, 0 r b. So any integer. 6 Itegers Modulo I Example 2.3(e), we have defied the cogruece of two itegers a,b with respect to a modulus. Let us recall that a b (mod ) meas a b. We have proved that cogruece is a equivalece relatio

More information

CS284A: Representations and Algorithms in Molecular Biology

CS284A: Representations and Algorithms in Molecular Biology CS284A: Represetatios ad Algorithms i Molecular Biology Scribe Notes o Lectures 3 & 4: Motif Discovery via Eumeratio & Motif Represetatio Usig Positio Weight Matrix Joshua Gervi Based o presetatios by

More information

10-701/ Machine Learning Mid-term Exam Solution

10-701/ Machine Learning Mid-term Exam Solution 0-70/5-78 Machie Learig Mid-term Exam Solutio Your Name: Your Adrew ID: True or False (Give oe setece explaatio) (20%). (F) For a cotiuous radom variable x ad its probability distributio fuctio p(x), it

More information

On Random Line Segments in the Unit Square

On Random Line Segments in the Unit Square O Radom Lie Segmets i the Uit Square Thomas A. Courtade Departmet of Electrical Egieerig Uiversity of Califoria Los Ageles, Califoria 90095 Email: tacourta@ee.ucla.edu I. INTRODUCTION Let Q = [0, 1] [0,

More information

This is an introductory course in Analysis of Variance and Design of Experiments.

This is an introductory course in Analysis of Variance and Design of Experiments. 1 Notes for M 384E, Wedesday, Jauary 21, 2009 (Please ote: I will ot pass out hard-copy class otes i future classes. If there are writte class otes, they will be posted o the web by the ight before class

More information

Axis Aligned Ellipsoid

Axis Aligned Ellipsoid Machie Learig for Data Sciece CS 4786) Lecture 6,7 & 8: Ellipsoidal Clusterig, Gaussia Mixture Models ad Geeral Mixture Models The text i black outlies high level ideas. The text i blue provides simple

More information

Chapter 6 Principles of Data Reduction

Chapter 6 Principles of Data Reduction Chapter 6 for BST 695: Special Topics i Statistical Theory. Kui Zhag, 0 Chapter 6 Priciples of Data Reductio Sectio 6. Itroductio Goal: To summarize or reduce the data X, X,, X to get iformatio about a

More information

Access to the published version may require journal subscription. Published with permission from: Elsevier.

Access to the published version may require journal subscription. Published with permission from: Elsevier. This is a author produced versio of a paper published i Statistics ad Probability Letters. This paper has bee peer-reviewed, it does ot iclude the joural pagiatio. Citatio for the published paper: Forkma,

More information

SRC Technical Note June 17, Tight Thresholds for The Pure Literal Rule. Michael Mitzenmacher. d i g i t a l

SRC Technical Note June 17, Tight Thresholds for The Pure Literal Rule. Michael Mitzenmacher. d i g i t a l SRC Techical Note 1997-011 Jue 17, 1997 Tight Thresholds for The Pure Literal Rule Michael Mitzemacher d i g i t a l Systems Research Ceter 130 Lytto Aveue Palo Alto, Califoria 94301 http://www.research.digital.com/src/

More information

1 6 = 1 6 = + Factorials and Euler s Gamma function

1 6 = 1 6 = + Factorials and Euler s Gamma function Royal Holloway Uiversity of Lodo Departmet of Physics Factorials ad Euler s Gamma fuctio Itroductio The is a self-cotaied part of the course dealig, essetially, with the factorial fuctio ad its geeralizatio

More information

Properties and Hypothesis Testing

Properties and Hypothesis Testing Chapter 3 Properties ad Hypothesis Testig 3.1 Types of data The regressio techiques developed i previous chapters ca be applied to three differet kids of data. 1. Cross-sectioal data. 2. Time series data.

More information

Newton s Laws: What is Their Operational Meaning?

Newton s Laws: What is Their Operational Meaning? Newto s Laws: What is Their Operatioal Meaig? Olga Kosheleva ad Vladik Kreiovich Uiversity of Texas at El Paso 500 W. Uiversity El Paso, TX 79968, USA olgak@utep.edu, vladik@utep.edu Abstract Newto s mechaics

More information

CHAPTER 8 FUNDAMENTAL SAMPLING DISTRIBUTIONS AND DATA DESCRIPTIONS. 8.1 Random Sampling. 8.2 Some Important Statistics

CHAPTER 8 FUNDAMENTAL SAMPLING DISTRIBUTIONS AND DATA DESCRIPTIONS. 8.1 Random Sampling. 8.2 Some Important Statistics CHAPTER 8 FUNDAMENTAL SAMPLING DISTRIBUTIONS AND DATA DESCRIPTIONS 8.1 Radom Samplig The basic idea of the statistical iferece is that we are allowed to draw ifereces or coclusios about a populatio based

More information

Math 155 (Lecture 3)

Math 155 (Lecture 3) Math 55 (Lecture 3) September 8, I this lecture, we ll cosider the aswer to oe of the most basic coutig problems i combiatorics Questio How may ways are there to choose a -elemet subset of the set {,,,

More information

Resampling Methods. X (1/2), i.e., Pr (X i m) = 1/2. We order the data: X (1) X (2) X (n). Define the sample median: ( n.

Resampling Methods. X (1/2), i.e., Pr (X i m) = 1/2. We order the data: X (1) X (2) X (n). Define the sample median: ( n. Jauary 1, 2019 Resamplig Methods Motivatio We have so may estimators with the property θ θ d N 0, σ 2 We ca also write θ a N θ, σ 2 /, where a meas approximately distributed as Oce we have a cosistet estimator

More information

x 2 x x x x x + x x +2 x

x 2 x x x x x + x x +2 x Math 5440: Notes o particle radom walk Aaro Fogelso September 6, 005 Derivatio of the diusio equatio: Imagie that there is a distributio of particles spread alog the x-axis ad that the particles udergo

More information

Chapter 13, Part A Analysis of Variance and Experimental Design

Chapter 13, Part A Analysis of Variance and Experimental Design Slides Prepared by JOHN S. LOUCKS St. Edward s Uiversity Slide 1 Chapter 13, Part A Aalysis of Variace ad Eperimetal Desig Itroductio to Aalysis of Variace Aalysis of Variace: Testig for the Equality of

More information

Lecture 2: Monte Carlo Simulation

Lecture 2: Monte Carlo Simulation STAT/Q SCI 43: Itroductio to Resamplig ethods Sprig 27 Istructor: Ye-Chi Che Lecture 2: ote Carlo Simulatio 2 ote Carlo Itegratio Assume we wat to evaluate the followig itegratio: e x3 dx What ca we do?

More information

Approximate Confidence Interval for the Reciprocal of a Normal Mean with a Known Coefficient of Variation

Approximate Confidence Interval for the Reciprocal of a Normal Mean with a Known Coefficient of Variation Metodološki zvezki, Vol. 13, No., 016, 117-130 Approximate Cofidece Iterval for the Reciprocal of a Normal Mea with a Kow Coefficiet of Variatio Wararit Paichkitkosolkul 1 Abstract A approximate cofidece

More information

Lecture 6 Chi Square Distribution (χ 2 ) and Least Squares Fitting

Lecture 6 Chi Square Distribution (χ 2 ) and Least Squares Fitting Lecture 6 Chi Square Distributio (χ ) ad Least Squares Fittig Chi Square Distributio (χ ) Suppose: We have a set of measuremets {x 1, x, x }. We kow the true value of each x i (x t1, x t, x t ). We would

More information

Rotationally invariant integrals of arbitrary dimensions

Rotationally invariant integrals of arbitrary dimensions September 1, 14 Rotatioally ivariat itegrals of arbitrary dimesios James D. Wells Physics Departmet, Uiversity of Michiga, A Arbor Abstract: I this ote itegrals over spherical volumes with rotatioally

More information

Frequentist Inference

Frequentist Inference Frequetist Iferece The topics of the ext three sectios are useful applicatios of the Cetral Limit Theorem. Without kowig aythig about the uderlyig distributio of a sequece of radom variables {X i }, for

More information

Chapter 3. Strong convergence. 3.1 Definition of almost sure convergence

Chapter 3. Strong convergence. 3.1 Definition of almost sure convergence Chapter 3 Strog covergece As poited out i the Chapter 2, there are multiple ways to defie the otio of covergece of a sequece of radom variables. That chapter defied covergece i probability, covergece i

More information

CHAPTER 2. Mean This is the usual arithmetic mean or average and is equal to the sum of the measurements divided by number of measurements.

CHAPTER 2. Mean This is the usual arithmetic mean or average and is equal to the sum of the measurements divided by number of measurements. CHAPTER 2 umerical Measures Graphical method may ot always be sufficiet for describig data. You ca use the data to calculate a set of umbers that will covey a good metal picture of the frequecy distributio.

More information

1 Review of Probability & Statistics

1 Review of Probability & Statistics 1 Review of Probability & Statistics a. I a group of 000 people, it has bee reported that there are: 61 smokers 670 over 5 960 people who imbibe (drik alcohol) 86 smokers who imbibe 90 imbibers over 5

More information

IP Reference guide for integer programming formulations.

IP Reference guide for integer programming formulations. IP Referece guide for iteger programmig formulatios. by James B. Orli for 15.053 ad 15.058 This documet is iteded as a compact (or relatively compact) guide to the formulatio of iteger programs. For more

More information

Section 4.3. Boolean functions

Section 4.3. Boolean functions Sectio 4.3. Boolea fuctios Let us take aother look at the simplest o-trivial Boolea algebra, ({0}), the power-set algebra based o a oe-elemet set, chose here as {0}. This has two elemets, the empty set,

More information

Math 10A final exam, December 16, 2016

Math 10A final exam, December 16, 2016 Please put away all books, calculators, cell phoes ad other devices. You may cosult a sigle two-sided sheet of otes. Please write carefully ad clearly, USING WORDS (ot just symbols). Remember that the

More information

Lecture 2 February 8, 2016

Lecture 2 February 8, 2016 MIT 6.854/8.45: Advaced Algorithms Sprig 206 Prof. Akur Moitra Lecture 2 February 8, 206 Scribe: Calvi Huag, Lih V. Nguye I this lecture, we aalyze the problem of schedulig equal size tasks arrivig olie

More information

(3) If you replace row i of A by its sum with a multiple of another row, then the determinant is unchanged! Expand across the i th row:

(3) If you replace row i of A by its sum with a multiple of another row, then the determinant is unchanged! Expand across the i th row: Math 5-4 Tue Feb 4 Cotiue with sectio 36 Determiats The effective way to compute determiats for larger-sized matrices without lots of zeroes is to ot use the defiitio, but rather to use the followig facts,

More information

Double Stage Shrinkage Estimator of Two Parameters. Generalized Exponential Distribution

Double Stage Shrinkage Estimator of Two Parameters. Generalized Exponential Distribution Iteratioal Mathematical Forum, Vol., 3, o. 3, 3-53 HIKARI Ltd, www.m-hikari.com http://dx.doi.org/.9/imf.3.335 Double Stage Shrikage Estimator of Two Parameters Geeralized Expoetial Distributio Alaa M.

More information

t distribution [34] : used to test a mean against an hypothesized value (H 0 : µ = µ 0 ) or the difference

t distribution [34] : used to test a mean against an hypothesized value (H 0 : µ = µ 0 ) or the difference EXST30 Backgroud material Page From the textbook The Statistical Sleuth Mea [0]: I your text the word mea deotes a populatio mea (µ) while the work average deotes a sample average ( ). Variace [0]: The

More information

THE ASYMPTOTIC COMPLEXITY OF MATRIX REDUCTION OVER FINITE FIELDS

THE ASYMPTOTIC COMPLEXITY OF MATRIX REDUCTION OVER FINITE FIELDS THE ASYMPTOTIC COMPLEXITY OF MATRIX REDUCTION OVER FINITE FIELDS DEMETRES CHRISTOFIDES Abstract. Cosider a ivertible matrix over some field. The Gauss-Jorda elimiatio reduces this matrix to the idetity

More information

Element sampling: Part 2

Element sampling: Part 2 Chapter 4 Elemet samplig: Part 2 4.1 Itroductio We ow cosider uequal probability samplig desigs which is very popular i practice. I the uequal probability samplig, we ca improve the efficiecy of the resultig

More information

Lecture 6 Chi Square Distribution (χ 2 ) and Least Squares Fitting

Lecture 6 Chi Square Distribution (χ 2 ) and Least Squares Fitting Lecture 6 Chi Square Distributio (χ ) ad Least Squares Fittig Chi Square Distributio (χ ) Suppose: We have a set of measuremets {x 1, x, x }. We kow the true value of each x i (x t1, x t, x t ). We would

More information

CHAPTER I: Vector Spaces

CHAPTER I: Vector Spaces CHAPTER I: Vector Spaces Sectio 1: Itroductio ad Examples This first chapter is largely a review of topics you probably saw i your liear algebra course. So why cover it? (1) Not everyoe remembers everythig

More information

Understanding Samples

Understanding Samples 1 Will Moroe CS 109 Samplig ad Bootstrappig Lecture Notes #17 August 2, 2017 Based o a hadout by Chris Piech I this chapter we are goig to talk about statistics calculated o samples from a populatio. We

More information

MA131 - Analysis 1. Workbook 9 Series III

MA131 - Analysis 1. Workbook 9 Series III MA3 - Aalysis Workbook 9 Series III Autum 004 Cotets 4.4 Series with Positive ad Negative Terms.............. 4.5 Alteratig Series.......................... 4.6 Geeral Series.............................

More information

The Binomial Theorem

The Binomial Theorem The Biomial Theorem Robert Marti Itroductio The Biomial Theorem is used to expad biomials, that is, brackets cosistig of two distict terms The formula for the Biomial Theorem is as follows: (a + b ( k

More information

1 of 7 7/16/2009 6:06 AM Virtual Laboratories > 6. Radom Samples > 1 2 3 4 5 6 7 6. Order Statistics Defiitios Suppose agai that we have a basic radom experimet, ad that X is a real-valued radom variable

More information

Convergence of random variables. (telegram style notes) P.J.C. Spreij

Convergence of random variables. (telegram style notes) P.J.C. Spreij Covergece of radom variables (telegram style otes).j.c. Spreij this versio: September 6, 2005 Itroductio As we kow, radom variables are by defiitio measurable fuctios o some uderlyig measurable space

More information

Sequences of Definite Integrals, Factorials and Double Factorials

Sequences of Definite Integrals, Factorials and Double Factorials 47 6 Joural of Iteger Sequeces, Vol. 8 (5), Article 5.4.6 Sequeces of Defiite Itegrals, Factorials ad Double Factorials Thierry Daa-Picard Departmet of Applied Mathematics Jerusalem College of Techology

More information

CSE 527, Additional notes on MLE & EM

CSE 527, Additional notes on MLE & EM CSE 57 Lecture Notes: MLE & EM CSE 57, Additioal otes o MLE & EM Based o earlier otes by C. Grat & M. Narasimha Itroductio Last lecture we bega a examiatio of model based clusterig. This lecture will be

More information

Discrete Mathematics for CS Spring 2008 David Wagner Note 22

Discrete Mathematics for CS Spring 2008 David Wagner Note 22 CS 70 Discrete Mathematics for CS Sprig 2008 David Wager Note 22 I.I.D. Radom Variables Estimatig the bias of a coi Questio: We wat to estimate the proportio p of Democrats i the US populatio, by takig

More information

MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.436J/15.085J Fall 2008 Lecture 19 11/17/2008 LAWS OF LARGE NUMBERS II THE STRONG LAW OF LARGE NUMBERS

MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.436J/15.085J Fall 2008 Lecture 19 11/17/2008 LAWS OF LARGE NUMBERS II THE STRONG LAW OF LARGE NUMBERS MASSACHUSTTS INSTITUT OF TCHNOLOGY 6.436J/5.085J Fall 2008 Lecture 9 /7/2008 LAWS OF LARG NUMBRS II Cotets. The strog law of large umbers 2. The Cheroff boud TH STRONG LAW OF LARG NUMBRS While the weak

More information

1 Inferential Methods for Correlation and Regression Analysis

1 Inferential Methods for Correlation and Regression Analysis 1 Iferetial Methods for Correlatio ad Regressio Aalysis I the chapter o Correlatio ad Regressio Aalysis tools for describig bivariate cotiuous data were itroduced. The sample Pearso Correlatio Coefficiet

More information

Sequences. Notation. Convergence of a Sequence

Sequences. Notation. Convergence of a Sequence Sequeces A sequece is essetially just a list. Defiitio (Sequece of Real Numbers). A sequece of real umbers is a fuctio Z (, ) R for some real umber. Do t let the descriptio of the domai cofuse you; it

More information

Confidence intervals summary Conservative and approximate confidence intervals for a binomial p Examples. MATH1005 Statistics. Lecture 24. M.

Confidence intervals summary Conservative and approximate confidence intervals for a binomial p Examples. MATH1005 Statistics. Lecture 24. M. MATH1005 Statistics Lecture 24 M. Stewart School of Mathematics ad Statistics Uiversity of Sydey Outlie Cofidece itervals summary Coservative ad approximate cofidece itervals for a biomial p The aïve iterval

More information

Advanced Stochastic Processes.

Advanced Stochastic Processes. Advaced Stochastic Processes. David Gamarik LECTURE 2 Radom variables ad measurable fuctios. Strog Law of Large Numbers (SLLN). Scary stuff cotiued... Outlie of Lecture Radom variables ad measurable fuctios.

More information

4. Partial Sums and the Central Limit Theorem

4. Partial Sums and the Central Limit Theorem 1 of 10 7/16/2009 6:05 AM Virtual Laboratories > 6. Radom Samples > 1 2 3 4 5 6 7 4. Partial Sums ad the Cetral Limit Theorem The cetral limit theorem ad the law of large umbers are the two fudametal theorems

More information

Bernoulli numbers and the Euler-Maclaurin summation formula

Bernoulli numbers and the Euler-Maclaurin summation formula Physics 6A Witer 006 Beroulli umbers ad the Euler-Maclauri summatio formula I this ote, I shall motivate the origi of the Euler-Maclauri summatio formula. I will also explai why the coefficiets o the right

More information

w (1) ˆx w (1) x (1) /ρ and w (2) ˆx w (2) x (2) /ρ.

w (1) ˆx w (1) x (1) /ρ and w (2) ˆx w (2) x (2) /ρ. 2 5. Weighted umber of late jobs 5.1. Release dates ad due dates: maximimizig the weight of o-time jobs Oce we add release dates, miimizig the umber of late jobs becomes a sigificatly harder problem. For

More information

Lecture 15: Learning Theory: Concentration Inequalities

Lecture 15: Learning Theory: Concentration Inequalities STAT 425: Itroductio to Noparametric Statistics Witer 208 Lecture 5: Learig Theory: Cocetratio Iequalities Istructor: Ye-Chi Che 5. Itroductio Recall that i the lecture o classificatio, we have see that

More information

Topic 5: Basics of Probability

Topic 5: Basics of Probability Topic 5: Jue 1, 2011 1 Itroductio Mathematical structures lie Euclidea geometry or algebraic fields are defied by a set of axioms. Mathematical reality is the developed through the itroductio of cocepts

More information

Goodness-of-Fit Tests and Categorical Data Analysis (Devore Chapter Fourteen)

Goodness-of-Fit Tests and Categorical Data Analysis (Devore Chapter Fourteen) Goodess-of-Fit Tests ad Categorical Data Aalysis (Devore Chapter Fourtee) MATH-252-01: Probability ad Statistics II Sprig 2019 Cotets 1 Chi-Squared Tests with Kow Probabilities 1 1.1 Chi-Squared Testig................

More information

Anna Janicka Mathematical Statistics 2018/2019 Lecture 1, Parts 1 & 2

Anna Janicka Mathematical Statistics 2018/2019 Lecture 1, Parts 1 & 2 Aa Jaicka Mathematical Statistics 18/19 Lecture 1, Parts 1 & 1. Descriptive Statistics By the term descriptive statistics we will mea the tools used for quatitative descriptio of the properties of a sample

More information

Seunghee Ye Ma 8: Week 5 Oct 28

Seunghee Ye Ma 8: Week 5 Oct 28 Week 5 Summary I Sectio, we go over the Mea Value Theorem ad its applicatios. I Sectio 2, we will recap what we have covered so far this term. Topics Page Mea Value Theorem. Applicatios of the Mea Value

More information

Median and IQR The median is the value which divides the ordered data values in half.

Median and IQR The median is the value which divides the ordered data values in half. STA 666 Fall 2007 Web-based Course Notes 4: Describig Distributios Numerically Numerical summaries for quatitative variables media ad iterquartile rage (IQR) 5-umber summary mea ad stadard deviatio Media

More information

ECE 901 Lecture 12: Complexity Regularization and the Squared Loss

ECE 901 Lecture 12: Complexity Regularization and the Squared Loss ECE 90 Lecture : Complexity Regularizatio ad the Squared Loss R. Nowak 5/7/009 I the previous lectures we made use of the Cheroff/Hoeffdig bouds for our aalysis of classifier errors. Hoeffdig s iequality

More information

ADVANCED SOFTWARE ENGINEERING

ADVANCED SOFTWARE ENGINEERING ADVANCED SOFTWARE ENGINEERING COMP 3705 Exercise Usage-based Testig ad Reliability Versio 1.0-040406 Departmet of Computer Ssciece Sada Narayaappa, Aeliese Adrews Versio 1.1-050405 Departmet of Commuicatio

More information

Bounds for the Positive nth-root of Positive Integers

Bounds for the Positive nth-root of Positive Integers Pure Mathematical Scieces, Vol. 6, 07, o., 47-59 HIKARI Ltd, www.m-hikari.com https://doi.org/0.988/pms.07.7 Bouds for the Positive th-root of Positive Itegers Rachid Marsli Mathematics ad Statistics Departmet

More information

The Riemann Zeta Function

The Riemann Zeta Function Physics 6A Witer 6 The Riema Zeta Fuctio I this ote, I will sketch some of the mai properties of the Riema zeta fuctio, ζ(x). For x >, we defie ζ(x) =, x >. () x = For x, this sum diverges. However, we

More information

Probability and statistics: basic terms

Probability and statistics: basic terms Probability ad statistics: basic terms M. Veeraraghava August 203 A radom variable is a rule that assigs a umerical value to each possible outcome of a experimet. Outcomes of a experimet form the sample

More information

NICK DUFRESNE. 1 1 p(x). To determine some formulas for the generating function of the Schröder numbers, r(x) = a(x) =

NICK DUFRESNE. 1 1 p(x). To determine some formulas for the generating function of the Schröder numbers, r(x) = a(x) = AN INTRODUCTION TO SCHRÖDER AND UNKNOWN NUMBERS NICK DUFRESNE Abstract. I this article we will itroduce two types of lattice paths, Schröder paths ad Ukow paths. We will examie differet properties of each,

More information

A statistical method to determine sample size to estimate characteristic value of soil parameters

A statistical method to determine sample size to estimate characteristic value of soil parameters A statistical method to determie sample size to estimate characteristic value of soil parameters Y. Hojo, B. Setiawa 2 ad M. Suzuki 3 Abstract Sample size is a importat factor to be cosidered i determiig

More information