Inverse Document Frequency (IDF): A Measure of Deviations from Poisson
|
|
- Logan Bell
- 5 years ago
- Views:
Transcription
1 Inverse Dcument Frequency (IDF): A Measure f Deviatins frm Pissn Kenneth W. Church William A. Gale AT&T Bell Labratries Murray Hill, NJ, USA kwc@research.att.cm Abstract Lw frequency wrds tend t be rich in cntent, and vice versa. But nt all equally frequent wrds are equally meaningful. We will use inverse dcument frequency (IDF), a quantity brrwed frm Infrmatin Retrieval, t distinguish wrds like smewhat and byctt. Bth smewhat and byctt appeared apprximately 1000 times in a crpus f 1989 Assciated Press articles, but byctt is a better keywrd because its IDF is farther frm what wuld be expected by chance (Pissn). 1. Dcument frequency is similar t wrd frequency, but different Wrd frequency is cmmnly used in all srts f natural language applicatins. The practice implicitly assumes that wrds (and ngrams) are distributed by a single parameter distributin such as a Pissn r a Binmial. But we find that these distributins d nt fit the data very well. Bth the Pissn and Binmial assume that the variance ver dcuments is n larger than the mean, and yet, we find that it can be quite a bit larger, especially fr interesting wrds such as byctt where there are hidden variables such as tpic that cnspire t undermine the independence assumptin behind the Pissn and the Binmial. Much better fits are btained by intrducing a secnd parameter such as inverse dcument frequency (IDF). Inverse dcument frequency (IDF) is cmmnly used in Infrmatin Retrieval (Sparck Jnes, 1972). IDF is defined as lg 2 df w / D, where D is the number f dcuments in the cllectin and df w is the dcument frequency, the number f dcuments that cntain w. Obviusly, there is a strng relatinship between dcument frequency, df w, and wrd frequency, f w. The relatinship is shwn in Figure 1, a plt f lg 10 f w and IDF fr 193 wrds selected frm a 50 millin wrd crpus f 1989 Assciated Press (AP) Newswire stries (D = 85, 432 stries). Althugh lg 10 f w is highly crrelated with IDF (ρ = ), it wuld be a mistake t assume that the tw variables are cmpletely predictable frm ne anther. Indeed, the experience f the Infrmatin Retrieval cmmunity has indicated that IDF is a very useful quantity. Attempts t replace IDF with f w (r sme simple transfrm f f w ) have nt been very successful. Figure 2 shws ne such attempt. It cmpares the bserved IDF with IDF ˆ, an estimate based n f. Assume that a dcument is merely a bag f wrds with n interesting structure (cntent). Wrds are randmly generated by a Pissn prcess, π. The prbability f k instances f a wrd w is π(θ,k) where θ _ f : w D
2 π(θ, k) = _ e θ θ k fr k = 0, 1,... Pissn k! In particular, the prbability that w will nt be fund in a dcument is π(θ, 0 ). Cnversely, the prbability f at least ne w is 1 π(θ, 0 ). And therefre, IDF ught t be: IDF ˆ = lg 2 ( 1 π(θ, 0 )) = lg 2 ( 1 e θ ) Predicted IDF Figure 2 cmpares IDF with IDF ˆ. Nte that IDF ˆ is systematically t lw, indicating that the predictins are missing crucial generalizatins. Dcuments are mre than just a bag f wrds. The predictin errrs are shwn in mre detail in Figure 3, which plts the residual IDF (difference between predicted and bserved) as a functin f lg 10 f w fr the same 193 wrds shwn in Figure 2. The predictin errrs are relatively large in the middle f the frequency range, and smaller at bth ends. Unfrtunately, we believe the wrds in the middle are ften the mst imprtant wrds fr Infrmatin Retrieval purpses. IDF lg10 frequency Figure 1: IDF is highly crrelated with lg frequency (ρ = ). The circles shw lg 10 f and IDF fr 193 wrds selected frm a crpus f 1989 Assciated Press Newswire stries (D = 85, 432). 2. A Gd Keywrd is far frm Pissn T get a better lk at the crucial differences between IDF and f in the middle frequency range (f 10 3 ), we selected a set f 53 wrds fr further investigatin with 1000 < f < 1020 in the 1989 AP crpus. The 53 wrds are shwn in Table 1, srted by df. Nte that the wrds near the tp f the list tend t be mre apprpriate fr use in an infrmatin retrieval system than the wrds tward the bttm f the list. Stries that mentin the wrd byctt, fr example, are likely t be abut byctts. In cntrast, stries that mentin the wrd smewhat culd be abut practically anything. 1
3 predicted IDF bserved IDF Figure 2: The bserved IDF is systematically lwer than what wuld be expected under a Pissn, lg 2 ( 1 e f / D ). All but 6 f the circles fall belw the x = y line. The data are the same as in Figure 1. Why is IDF such a useful quantity? One might try t answer the questin in terms f infrmatin thery (Shannn, 1948). IDF can be thught f as the usefulness in bits f a keywrd t a keywrd retrieval system. If we tell yu that the dcument that we are lking fr has the keywrd byctt, then we have narrwed the search space dwn t just 676/D dcuments. But, this answer desn t explain the fundamental difference between byctt and smewhat. byctt has an IDF f lg 2 676/ D = 7. 0 bits, nly a little mre than smewhat, which has an IDF f lg 2 979/ D = And yet, byctt is a reasnable keywrd and smewhat is nt. A gd keywrd, like byctt, picks ut a very specific set f dcuments. The prblem with smewhat is that it behaves almst like chance (Pissn). Under a Pissn, the 1013 instances f smewhat shuld be fund in apprximately D( 1 π(θ, 0 ) ) D( 1 π( 1013/ 85432, 0 ) ) 1007 dcuments. In fact, smewhat was fund in 979 dcuments, nly a little less than what wuld have been expected by chance. Gd keywrds tend t bunch up int many fewer dcuments. byctt, fr example, bunches up int nly 676 dcuments, much less than chance (D( 1 π( 1009/ 85432, 0 ) ) 1003). Almst all wrds are mre interesting in this sense than Pissn, but gd keywrds like byctt are a lt mre interesting than Pissn, and crummy nes like smewhat are nly a little mre interesting than Pissn. 1. There is a weak tendency fr nuns t appear higher n the list than nn-nuns, thugh tendency is t weak t explain the pattern f the systematic deviatins frm Pissn. In additin, there are plenty f exceptins in bth directins: rape, pl, grants, cde and premier are nt necessarily nuns, and sweeping, leads, bund and wrry are nt necessarily nn-nuns.
4 predicted IDF - bserved IDF Petitiner Frmm Germans Gray Mn Stevens culd which lg10 frequency Figure 3: The Predictin errrs are systematically psitive. The errrs tend t be larger in the middle f the frequency range (Germans), and smaller at bth ends (Frmm, which). The data are the same as in Figures 1-2. On this accunt, a gd keywrd is ne that behaves very differently frm the null hypthesis (Pissn). We cnjecture that the best keywrds tend t be fund tward the middle f the frequency range, where there are relatively large deviatins frm Pissn, as illustrated in Figure 3. This hypthesis runs cunter t the standard practice in Infrmatin Retrieval f weighting wrds by IDF, favring extremely rare wrds, n matter hw they are distributed. Of curse, IDF is but ne f many ways t shw deviatins frm chance. Figure 4 shws the distributins fr byctt and smewhat. Nte that smewhat is much clser t Pissn in almst any sense f clseness that ne might cnsider. Three measures f clseness are presented in Table 2: IDF, variance (σ 2 ), and entrpy (H). Table 2 cmpares the tp 10 wrds in Table 1 (labeled better keywrds ) with the bttm 10 wrds in Table 1 (labeled wrse keywrds ). The better keywrds have mre IDF, mre variance and less entrpy than what wuld be expected under a Pissn with θ f / D 1000/ 85, Hw rbust are these deviatins frm chance? We were cncerned that the crucial deviatins frm Pissn behavir might nt hld up if we lked at anther crpus f similar material. Figure 5 shws the wrd byctt in five different years f the AP news. The fat tails shw up in each f the five years. Clearly, the nn-pissn phenmenn is rbust. Figures 6 and 7 cmpare IDF and lg 10 σ 2 fr the 53 wrds in Table 1, and find that IDF and lg 10 σ 2 are reasnably stable acrss years. The crrelatins f IDF and lg 10 σ 2 acrss years are presented in Tables 3-4. All f the crrelatins are quite large. The crrelatins fr IDF are perhaps smewhat larger than thse fr lg 10 σ 2, suggesting that IDF may be smewhat mre rbust, which is nt
5 Table 1: Mre IDF (less df) Mre Cntent df w df w df w df w 435 gvernrs 724 pl 827 unity 937 wrry 506 festival 740 restaurants 845 bed 940 cntaining 551 gang 745 grants 847 castal 946 explained 553 bullin 752 scheme 851 educatinal 951 bund 563 attendants 754 cde 853 lying 953 leads 623 rape 761 premier 853 neighbr 955 happens 639 palace 775 wire 863 tragedy 960 imprving 676 byctt 781 custmer 867 acquire 960 welcmed 687 rutes 783 rms 874 restred 961 triggered 690 incentives 786 engineering 905 legitimate 966 sweeping 695 pverty 803 clr 910 deliver 968 fairly 718 dnatins 811 pssessin 914 types 969 heading 722 lawsuits 815 prjected 929 reject 979 smewhat 986 nting # f Dcuments = D * Pr(k) smewhat Pissn byctt Figure 4: Mst wrds have a fatter tail than Pissn (slid line). The deviatins frm Pissn are mre salient fr gd keywrds like byctt, than fr crummy keywrds like smewhat. k surprising given that empirical estimates f variance are ntriusly subject t utliers. Nne f the crrelatins in Tables 3 and 4 can be attributed t wrd frequency effects since the 53 wrds were all chsen with almst the same 1989 frequency. In general, the crrelatins in Tables 3-4 are larger near the diagnal, suggesting that estimates degrade ver time. If yu want t predict next year s IDF, it is better t use this year s estimate than a ten-yearld estimate.
6 Table 2: Gd keywrds have mre IDF, mre var and less entrpy than Pissn Better Keywrds Wrse Keywrds IDF var entrpy IDF var entrpy gvernrs leads festival happens gang imprving bullin welcmed attendants triggered rape sweeping palace fairly byctt heading rutes smewhat incentives nting Pissn Pissn # f Dcuments = D * Pr(k) Pissn K Figure 5: The strng deviatins frm Pissn fr the wrd byctt shw up very clearly in the AP in 1988, 1989, 1990, 1991 and 1992 (dtted lines). Katz K-mixture (Katz, persnal cmmunicatin), the slid line labelled K, fits the data better than the Pissn. k Anther way t cnfirm that ur measurements f IDF, variance and H have cnsequences acrss years in the AP data, is t nte that measurements f IDF, variance and H in 1989 can be used t predict wrd frequency in sme ther year. The crrelatins are shwn in Table 5. They may nt nt be large, but they are t large t be due t chance and they all pint in the same directin. The crrelatins cannt be attributed t variatins in frequency in 1989, since all 53 wrds have almst the same 1989 frequency. Clearly, there are sme interesting systematic relatinships between IDF/variance/H and f that hld up t replicatin acrss multiple years in the AP, measurement errrs, and ther surces f nise.
7 Katz K-mixture Figure 6: IDF in ne year f the AP is very predictive f IDF in anther (fr the 53 wrds in Table 1). Each scatter plt cmpares IDF in ne year with IDF in anther. The fact that mst f the pints line up fairly well indicates that IDF values are strngly crrelated acrss years. The crrelatins are shwn in Table 3. Clearly, the Pissn des nt fit ur data very well, especially fr gd keywrds like byctt. This is, hwever, a negative result. Can we say smething mre cnstructive? Katz (persnal cmmunicatin) prpsed the fllwing alternative t the Pissn. prbability f k instances f w in a dcument. Pr K (k) is the Pr K (k) = ( 1 α) δ k, 0 + α ( β ) k K-mixture β + 1 β + 1 δ k, 0 is 1 when k = 0, and 0 therwise. Katz K-mixture distributin can be thught f as a mixture f Pissns. Suppse that, within dcuments, byctt is distributed by a Pissn prcess, but, acrss dcuments, the Pissn parameter θ is allwed t vary frm ne dcument t anther depending n hw much the dcument is abut byctts. In ther wrds, Pr K (k) can be expressed as a cnvlutin f Pissns with a density functin φ: Pr(k) = φ(θ) π(θ,k) dθ fr k = 0, 1,... Pissn Mixture 0 In this way, the θs can depend n an infinite number f unknwable hidden variables, e.g., what the dcuments are abut, wh wrte them, when they were written, what was ging n in the wrld when they were written, etc., but we dn t need t knw these dependencies fr any particular dcument. All we need t knw is φ, the density f θs, aggregated ver all pssible cmbinatins f hidden variables.
8 Figure 7: lg 10 σ 2 is als predictable frm ne year t the next, thugh maybe nt as predictable as IDF (fr the 53 wrds in Table 1). The crrelatins are shwn in Table 4. Table 3: Crrelatins f IDF acrss years Table 4: Crrelatins f lg var acrss years Table 5: Crrelatins f IDF, lg var and H in 1989 with lg f in ther years 1988 lg f 1990 lg f 1991 lg f 1992 lg f 1989 IDF lg var H In the case f Katz K-mixture, φ(θ) is assumed t be ( 1 α) δ(θ) + β α e β θ. δ(k) is Dirac s delta functin, when k = 0, and therwise, 0.
9 Katz K-mixture has tw parameters, α and β. The α parameter determines the fractin f relevant and irrelevant dcuments. 1 α f the dcuments have n chance f mentining byctt (θ = 0) because they are ttally irrelevant t byctts. The β parameter determines the average θ amng the relevant dcuments. The tw parameters, α and β, can be fit frm almst any pair f variables cnsidered thus far, e.g., f, IDF, σ 2, H. We have fund that f and IDF are particularly easy t wrk with, and are mre rbust than sme thers such as σ 2. β D f 2 IDF 1 α D f β 1 It has been ur experience that Katz K-mixture fits the data much better than the Pissn, as can be seen in Figure 5. Unlike the Pissn, the K-mixture has tw parameters, α and β, and can therefre accunt fr the fact that IDF and f are nt cmpletely predictable frm ne anther. In related wrk (Church and Gale, submitted), we lked at a number f different Pissn mixtures, and fund that ur data can als be fit by a negative binmial, which can be viewed as a Pissn mixture where φ NB (θ) is a Gamma distributin (Jhnsn and Ktz, 1969). See Msteller and Wallace (1964) fr an example f hw t use the negative binmial in a Bayesian discriminatin task. It is straightfrward t generalize the Msteller and Wallace apprach t use Katz K-mixture r any ther mixture f Pissns. 5. Cnclusins Dcuments are much mre than just a bag f wrds. The Pissn distributin predicts that lightning is unlike t strike twice in a single dcument. We shuldn t expect t see tw r mre instances f byctt in the same dcument (unless there is sme srt f hidden dependency that ges beynd the Pissn). But when it rains, it purs. If a dcument is abut byctts, we shuldn t be surprised t find tw byctts r even a half dzen in a single dcument. The standard use f the Pissn in mdeling the distributin f wrds and ngrams fails t fit the data except where there are almst n interesting hidden dependencies as in the case f smewhat. Why are the deviatins frm Pissn mre salient fr interesting wrds like byctt than fr bring wrds like smewhat? Many applicatins such as infrmatin retrieval, text categrizatin, authr identificatin and wrd-sense disambiguatin attempt t discriminate dcuments n the basis f certain hidden variables such as tpic, authr, genre, style, etc. The mre that a keywrd (r ngram) deviates frm Pissn, the strnger the dependence n hidden variables, and the mre useful the keywrd (r ngram) is fr discriminating dcuments n the basis f these hidden dependences. Similar arguments apply in a hst f ther imprtant applicatins such as text cmpressin and language mdeling fr speech recgnitin where it is desirable fr wrd and ngram prbabilities t adapt apprpriately t frequency changes due t varius hidden dependencies. We have used dcument frequency, df, a cncept brrwed frm Infrmatin Retrieval, t find deviatins frm Pissn behavir. Dcument frequency is similar t wrd frequency, but different in a subtle but crucial way. Althugh inverse dcument frequency (IDF) and lg 10 f are extremely highly
10 crrelated (ρ = ), it wuld be a mistake t try t mdel ne with a simple transfrm f the ther. Figure 5 shwed ne such attempt, where f was transfrmed int a predicted IDF by intrducing a Pissn assumptin: IDF ˆ = lg 2 ( 1 e θ ), with θ = _ f. w Unfrtunately, the predictin errrs were D relatively large fr the mst imprtant keywrds, wrds with mderate frequencies such as Germans. T get a better lk at the subtle differences between dcument frequency and wrd frequency, we fcused ur attentin n a set f 53 wrds that all had apprximately the same wrd frequency in a crpus f 1989 AP stries. Table 1 shwed that wrds with larger IDF tend t have mre cntent. byctt, fr example, is a better keywrd than smewhat because it bunches up int a relatively small set f dcuments. Table 2 shwed that variance and entrpy can als be used as a measure f cntent (at least amng a set f wrds with mre r less the same wrd frequency). A gd keywrd like byctt is farther frm Pissn (chance) than a crummy keywrd like smewhat by almst any sense f clseness that ne might cnsider, e.g., IDF, variance, entrpy. These crucial deviatins frm Pissn are rbust. We shwed in sectin 4 that deviatins frm Pissn in ne year f the AP can be used t predict deviatins in anther year f the AP. Acknwledgments This wrk benefited cnsiderably frm extensive discussins with Slava Katz. References Church, K., and Gale, W. (submitted) Pissn Mixtures. Jhnsn, N., and Ktz, S. (1969) Discrete Distributins, Hughtn Mifflin, Bstn. Katz, S. (in preparatin). Msteller, Fredrick, and David Wallace (1964) Inference and Disputed Authrship: The Federalist, Addisn-Wesley, Reading, Massachusetts. Saltn, G. (1989) Autmatic Text Prcessing, Addisn-Wesley. Shannn, C. (1948) The Mathematical Thery f Cmmunicatin, Bell System Technical Jurnal. Sparck Jnes, K. (1972) A Statistical Interpretatin f Term Specificity and its Applicatin in Retrieval, Jurnal f Dcumentatin, 28:1, pp van Rijsbergen, C. (1979) Infrmatin Retrieval, Secnd Editin, Butterwrths, Lndn.
CHAPTER 24: INFERENCE IN REGRESSION. Chapter 24: Make inferences about the population from which the sample data came.
MATH 1342 Ch. 24 April 25 and 27, 2013 Page 1 f 5 CHAPTER 24: INFERENCE IN REGRESSION Chapters 4 and 5: Relatinships between tw quantitative variables. Be able t Make a graph (scatterplt) Summarize the
More informationSUPPLEMENTARY MATERIAL GaGa: a simple and flexible hierarchical model for microarray data analysis
SUPPLEMENTARY MATERIAL GaGa: a simple and flexible hierarchical mdel fr micrarray data analysis David Rssell Department f Bistatistics M.D. Andersn Cancer Center, Hustn, TX 77030, USA rsselldavid@gmail.cm
More information, which yields. where z1. and z2
The Gaussian r Nrmal PDF, Page 1 The Gaussian r Nrmal Prbability Density Functin Authr: Jhn M Cimbala, Penn State University Latest revisin: 11 September 13 The Gaussian r Nrmal Prbability Density Functin
More informationInternal vs. external validity. External validity. This section is based on Stock and Watson s Chapter 9.
Sectin 7 Mdel Assessment This sectin is based n Stck and Watsn s Chapter 9. Internal vs. external validity Internal validity refers t whether the analysis is valid fr the ppulatin and sample being studied.
More informationModelling of Clock Behaviour. Don Percival. Applied Physics Laboratory University of Washington Seattle, Washington, USA
Mdelling f Clck Behaviur Dn Percival Applied Physics Labratry University f Washingtn Seattle, Washingtn, USA verheads and paper fr talk available at http://faculty.washingtn.edu/dbp/talks.html 1 Overview
More informationCS 477/677 Analysis of Algorithms Fall 2007 Dr. George Bebis Course Project Due Date: 11/29/2007
CS 477/677 Analysis f Algrithms Fall 2007 Dr. Gerge Bebis Curse Prject Due Date: 11/29/2007 Part1: Cmparisn f Srting Algrithms (70% f the prject grade) The bjective f the first part f the assignment is
More informationApplication of ILIUM to the estimation of the T eff [Fe/H] pair from BP/RP
Applicatin f ILIUM t the estimatin f the T eff [Fe/H] pair frm BP/RP prepared by: apprved by: reference: issue: 1 revisin: 1 date: 2009-02-10 status: Issued Cryn A.L. Bailer-Jnes Max Planck Institute fr
More informationCAUSAL INFERENCE. Technical Track Session I. Phillippe Leite. The World Bank
CAUSAL INFERENCE Technical Track Sessin I Phillippe Leite The Wrld Bank These slides were develped by Christel Vermeersch and mdified by Phillippe Leite fr the purpse f this wrkshp Plicy questins are causal
More informationA New Evaluation Measure. J. Joiner and L. Werner. The problems of evaluation and the needed criteria of evaluation
III-l III. A New Evaluatin Measure J. Jiner and L. Werner Abstract The prblems f evaluatin and the needed criteria f evaluatin measures in the SMART system f infrmatin retrieval are reviewed and discussed.
More informationREADING STATECHART DIAGRAMS
READING STATECHART DIAGRAMS Figure 4.48 A Statechart diagram with events The diagram in Figure 4.48 shws all states that the bject plane can be in during the curse f its life. Furthermre, it shws the pssible
More informationWeathering. Title: Chemical and Mechanical Weathering. Grade Level: Subject/Content: Earth and Space Science
Weathering Title: Chemical and Mechanical Weathering Grade Level: 9-12 Subject/Cntent: Earth and Space Science Summary f Lessn: Students will test hw chemical and mechanical weathering can affect a rck
More informationBootstrap Method > # Purpose: understand how bootstrap method works > obs=c(11.96, 5.03, 67.40, 16.07, 31.50, 7.73, 11.10, 22.38) > n=length(obs) >
Btstrap Methd > # Purpse: understand hw btstrap methd wrks > bs=c(11.96, 5.03, 67.40, 16.07, 31.50, 7.73, 11.10, 22.38) > n=length(bs) > mean(bs) [1] 21.64625 > # estimate f lambda > lambda = 1/mean(bs);
More informationActivity Guide Loops and Random Numbers
Unit 3 Lessn 7 Name(s) Perid Date Activity Guide Lps and Randm Numbers CS Cntent Lps are a relatively straightfrward idea in prgramming - yu want a certain chunk f cde t run repeatedly - but it takes a
More informationLab 1 The Scientific Method
INTRODUCTION The fllwing labratry exercise is designed t give yu, the student, an pprtunity t explre unknwn systems, r universes, and hypthesize pssible rules which may gvern the behavir within them. Scientific
More informationHypothesis Tests for One Population Mean
Hypthesis Tests fr One Ppulatin Mean Chapter 9 Ala Abdelbaki Objective Objective: T estimate the value f ne ppulatin mean Inferential statistics using statistics in rder t estimate parameters We will be
More informationWe can see from the graph above that the intersection is, i.e., [ ).
MTH 111 Cllege Algebra Lecture Ntes July 2, 2014 Functin Arithmetic: With nt t much difficulty, we ntice that inputs f functins are numbers, and utputs f functins are numbers. S whatever we can d with
More informationThis section is primarily focused on tools to aid us in finding roots/zeros/ -intercepts of polynomials. Essentially, our focus turns to solving.
Sectin 3.2: Many f yu WILL need t watch the crrespnding vides fr this sectin n MyOpenMath! This sectin is primarily fcused n tls t aid us in finding rts/zers/ -intercepts f plynmials. Essentially, ur fcus
More informationFive Whys How To Do It Better
Five Whys Definitin. As explained in the previus article, we define rt cause as simply the uncvering f hw the current prblem came int being. Fr a simple causal chain, it is the entire chain. Fr a cmplex
More informationPerfrmance f Sensitizing Rules n Shewhart Cntrl Charts with Autcrrelated Data Key Wrds: Autregressive, Mving Average, Runs Tests, Shewhart Cntrl Chart
Perfrmance f Sensitizing Rules n Shewhart Cntrl Charts with Autcrrelated Data Sandy D. Balkin Dennis K. J. Lin y Pennsylvania State University, University Park, PA 16802 Sandy Balkin is a graduate student
More informationKinetic Model Completeness
5.68J/10.652J Spring 2003 Lecture Ntes Tuesday April 15, 2003 Kinetic Mdel Cmpleteness We say a chemical kinetic mdel is cmplete fr a particular reactin cnditin when it cntains all the species and reactins
More informationPSU GISPOPSCI June 2011 Ordinary Least Squares & Spatial Linear Regression in GeoDa
There are tw parts t this lab. The first is intended t demnstrate hw t request and interpret the spatial diagnstics f a standard OLS regressin mdel using GeDa. The diagnstics prvide infrmatin abut the
More informationSubject description processes
Subject representatin 6.1.2. Subject descriptin prcesses Overview Fur majr prcesses r areas f practice fr representing subjects are classificatin, subject catalging, indexing, and abstracting. The prcesses
More informationWriting Guidelines. (Updated: November 25, 2009) Forwards
Writing Guidelines (Updated: Nvember 25, 2009) Frwards I have fund in my review f the manuscripts frm ur students and research assciates, as well as thse submitted t varius jurnals by thers that the majr
More informationALE 21. Gibbs Free Energy. At what temperature does the spontaneity of a reaction change?
Name Chem 163 Sectin: Team Number: ALE 21. Gibbs Free Energy (Reference: 20.3 Silberberg 5 th editin) At what temperature des the spntaneity f a reactin change? The Mdel: The Definitin f Free Energy S
More informationCOMP 551 Applied Machine Learning Lecture 5: Generative models for linear classification
COMP 551 Applied Machine Learning Lecture 5: Generative mdels fr linear classificatin Instructr: Herke van Hf (herke.vanhf@mail.mcgill.ca) Slides mstly by: Jelle Pineau Class web page: www.cs.mcgill.ca/~hvanh2/cmp551
More informationHow do scientists measure trees? What is DBH?
Hw d scientists measure trees? What is DBH? Purpse Students develp an understanding f tree size and hw scientists measure trees. Students bserve and measure tree ckies and explre the relatinship between
More informationComputational modeling techniques
Cmputatinal mdeling techniques Lecture 2: Mdeling change. In Petre Department f IT, Åb Akademi http://users.ab.fi/ipetre/cmpmd/ Cntent f the lecture Basic paradigm f mdeling change Examples Linear dynamical
More informationChapter Summary. Mathematical Induction Strong Induction Recursive Definitions Structural Induction Recursive Algorithms
Chapter 5 1 Chapter Summary Mathematical Inductin Strng Inductin Recursive Definitins Structural Inductin Recursive Algrithms Sectin 5.1 3 Sectin Summary Mathematical Inductin Examples f Prf by Mathematical
More informationCONSTRUCTING STATECHART DIAGRAMS
CONSTRUCTING STATECHART DIAGRAMS The fllwing checklist shws the necessary steps fr cnstructing the statechart diagrams f a class. Subsequently, we will explain the individual steps further. Checklist 4.6
More informationTHE LIFE OF AN OBJECT IT SYSTEMS
THE LIFE OF AN OBJECT IT SYSTEMS Persns, bjects, r cncepts frm the real wrld, which we mdel as bjects in the IT system, have "lives". Actually, they have tw lives; the riginal in the real wrld has a life,
More informationWRITING THE REPORT. Organizing the report. Title Page. Table of Contents
WRITING THE REPORT Organizing the reprt Mst reprts shuld be rganized in the fllwing manner. Smetime there is a valid reasn t include extra chapters in within the bdy f the reprt. 1. Title page 2. Executive
More informationLecture 2: Supervised vs. unsupervised learning, bias-variance tradeoff
Lecture 2: Supervised vs. unsupervised learning, bias-variance tradeff Reading: Chapter 2 STATS 202: Data mining and analysis September 27, 2017 1 / 20 Supervised vs. unsupervised learning In unsupervised
More informationPhysics 2B Chapter 23 Notes - Faraday s Law & Inductors Spring 2018
Michael Faraday lived in the Lndn area frm 1791 t 1867. He was 29 years ld when Hand Oersted, in 1820, accidentally discvered that electric current creates magnetic field. Thrugh empirical bservatin and
More informationLecture 2: Supervised vs. unsupervised learning, bias-variance tradeoff
Lecture 2: Supervised vs. unsupervised learning, bias-variance tradeff Reading: Chapter 2 STATS 202: Data mining and analysis September 27, 2017 1 / 20 Supervised vs. unsupervised learning In unsupervised
More informationGetting Involved O. Responsibilities of a Member. People Are Depending On You. Participation Is Important. Think It Through
f Getting Invlved O Literature Circles can be fun. It is exciting t be part f a grup that shares smething. S get invlved, read, think, and talk abut bks! Respnsibilities f a Member Remember a Literature
More informationExperiment #3. Graphing with Excel
Experiment #3. Graphing with Excel Study the "Graphing with Excel" instructins that have been prvided. Additinal help with learning t use Excel can be fund n several web sites, including http://www.ncsu.edu/labwrite/res/gt/gt-
More informationINSTRUMENTAL VARIABLES
INSTRUMENTAL VARIABLES Technical Track Sessin IV Sergi Urzua University f Maryland Instrumental Variables and IE Tw main uses f IV in impact evaluatin: 1. Crrect fr difference between assignment f treatment
More informationPlease Stop Laughing at Me and Pay it Forward Final Writing Assignment
Kirk Please Stp Laughing at Me and Pay it Frward Final Writing Assignment Our fcus fr the past few mnths has been n bullying and hw we treat ther peple. We ve played sme games, read sme articles, read
More informationDifferentiation Applications 1: Related Rates
Differentiatin Applicatins 1: Related Rates 151 Differentiatin Applicatins 1: Related Rates Mdel 1: Sliding Ladder 10 ladder y 10 ladder 10 ladder A 10 ft ladder is leaning against a wall when the bttm
More informationCHAPTER 4 DIAGNOSTICS FOR INFLUENTIAL OBSERVATIONS
CHAPTER 4 DIAGNOSTICS FOR INFLUENTIAL OBSERVATIONS 1 Influential bservatins are bservatins whse presence in the data can have a distrting effect n the parameter estimates and pssibly the entire analysis,
More informationPhys. 344 Ch 7 Lecture 8 Fri., April. 10 th,
Phys. 344 Ch 7 Lecture 8 Fri., April. 0 th, 009 Fri. 4/0 8. Ising Mdel f Ferrmagnets HW30 66, 74 Mn. 4/3 Review Sat. 4/8 3pm Exam 3 HW Mnday: Review fr est 3. See n-line practice test lecture-prep is t
More informationIf (IV) is (increased, decreased, changed), then (DV) will (increase, decrease, change) because (reason based on prior research).
Science Fair Prject Set Up Instructins 1) Hypthesis Statement 2) Materials List 3) Prcedures 4) Safety Instructins 5) Data Table 1) Hw t write a HYPOTHESIS STATEMENT Use the fllwing frmat: If (IV) is (increased,
More informationAdmissibility Conditions and Asymptotic Behavior of Strongly Regular Graphs
Admissibility Cnditins and Asympttic Behavir f Strngly Regular Graphs VASCO MOÇO MANO Department f Mathematics University f Prt Oprt PORTUGAL vascmcman@gmailcm LUÍS ANTÓNIO DE ALMEIDA VIEIRA Department
More informationPart 3 Introduction to statistical classification techniques
Part 3 Intrductin t statistical classificatin techniques Machine Learning, Part 3, March 07 Fabi Rli Preamble ØIn Part we have seen that if we knw: Psterir prbabilities P(ω i / ) Or the equivalent terms
More informationENSC Discrete Time Systems. Project Outline. Semester
ENSC 49 - iscrete Time Systems Prject Outline Semester 006-1. Objectives The gal f the prject is t design a channel fading simulatr. Upn successful cmpletin f the prject, yu will reinfrce yur understanding
More informationCHM112 Lab Graphing with Excel Grading Rubric
Name CHM112 Lab Graphing with Excel Grading Rubric Criteria Pints pssible Pints earned Graphs crrectly pltted and adhere t all guidelines (including descriptive title, prperly frmatted axes, trendline
More informationAP Statistics Practice Test Unit Three Exploring Relationships Between Variables. Name Period Date
AP Statistics Practice Test Unit Three Explring Relatinships Between Variables Name Perid Date True r False: 1. Crrelatin and regressin require explanatry and respnse variables. 1. 2. Every least squares
More informationComparing Several Means: ANOVA. Group Means and Grand Mean
STAT 511 ANOVA and Regressin 1 Cmparing Several Means: ANOVA Slide 1 Blue Lake snap beans were grwn in 12 pen-tp chambers which are subject t 4 treatments 3 each with O 3 and SO 2 present/absent. The ttal
More informationMedium Scale Integrated (MSI) devices [Sections 2.9 and 2.10]
EECS 270, Winter 2017, Lecture 3 Page 1 f 6 Medium Scale Integrated (MSI) devices [Sectins 2.9 and 2.10] As we ve seen, it s smetimes nt reasnable t d all the design wrk at the gate-level smetimes we just
More informationWe say that y is a linear function of x if. Chapter 13: The Correlation Coefficient and the Regression Line
Chapter 13: The Crrelatin Cefficient and the Regressin Line We begin with a sme useful facts abut straight lines. Recall the x, y crdinate system, as pictured belw. 3 2 1 y = 2.5 y = 0.5x 3 2 1 1 2 3 1
More informationThermodynamics Partial Outline of Topics
Thermdynamics Partial Outline f Tpics I. The secnd law f thermdynamics addresses the issue f spntaneity and invlves a functin called entrpy (S): If a prcess is spntaneus, then Suniverse > 0 (2 nd Law!)
More informationT Algorithmic methods for data mining. Slide set 6: dimensionality reduction
T-61.5060 Algrithmic methds fr data mining Slide set 6: dimensinality reductin reading assignment LRU bk: 11.1 11.3 PCA tutrial in mycurses (ptinal) ptinal: An Elementary Prf f a Therem f Jhnsn and Lindenstrauss,
More informationFlipping Physics Lecture Notes: Simple Harmonic Motion Introduction via a Horizontal Mass-Spring System
Flipping Physics Lecture Ntes: Simple Harmnic Mtin Intrductin via a Hrizntal Mass-Spring System A Hrizntal Mass-Spring System is where a mass is attached t a spring, riented hrizntally, and then placed
More informationSimple Linear Regression (single variable)
Simple Linear Regressin (single variable) Intrductin t Machine Learning Marek Petrik January 31, 2017 Sme f the figures in this presentatin are taken frm An Intrductin t Statistical Learning, with applicatins
More informationBASD HIGH SCHOOL FORMAL LAB REPORT
BASD HIGH SCHOOL FORMAL LAB REPORT *WARNING: After an explanatin f what t include in each sectin, there is an example f hw the sectin might lk using a sample experiment Keep in mind, the sample lab used
More informationAIP Logic Chapter 4 Notes
AIP Lgic Chapter 4 Ntes Sectin 4.1 Sectin 4.2 Sectin 4.3 Sectin 4.4 Sectin 4.5 Sectin 4.6 Sectin 4.7 4.1 The Cmpnents f Categrical Prpsitins There are fur types f categrical prpsitins. Prpsitin Letter
More informationWhat is Statistical Learning?
What is Statistical Learning? Sales 5 10 15 20 25 Sales 5 10 15 20 25 Sales 5 10 15 20 25 0 50 100 200 300 TV 0 10 20 30 40 50 Radi 0 20 40 60 80 100 Newspaper Shwn are Sales vs TV, Radi and Newspaper,
More informationDistributions, spatial statistics and a Bayesian perspective
Distributins, spatial statistics and a Bayesian perspective Dug Nychka Natinal Center fr Atmspheric Research Distributins and densities Cnditinal distributins and Bayes Thm Bivariate nrmal Spatial statistics
More information[COLLEGE ALGEBRA EXAM I REVIEW TOPICS] ( u s e t h i s t o m a k e s u r e y o u a r e r e a d y )
(Abut the final) [COLLEGE ALGEBRA EXAM I REVIEW TOPICS] ( u s e t h i s t m a k e s u r e y u a r e r e a d y ) The department writes the final exam s I dn't really knw what's n it and I can't very well
More informationSIZE BIAS IN LINE TRANSECT SAMPLING: A FIELD TEST. Mark C. Otto Statistics Research Division, Bureau of the Census Washington, D.C , U.S.A.
SIZE BIAS IN LINE TRANSECT SAMPLING: A FIELD TEST Mark C. Ott Statistics Research Divisin, Bureau f the Census Washingtn, D.C. 20233, U.S.A. and Kenneth H. Pllck Department f Statistics, Nrth Carlina State
More informationAristotle I PHIL301 Prof. Oakes Winthrop University updated: 3/14/14 8:48 AM
Aristtle I PHIL301 Prf. Oakes Winthrp University updated: 3/14/14 8:48 AM The Categries - The Categries is ne f several imprtant wrks by Aristtle n metaphysics. His tpic here is the classificatin f beings
More information4th Indian Institute of Astrophysics - PennState Astrostatistics School July, 2013 Vainu Bappu Observatory, Kavalur. Correlation and Regression
4th Indian Institute f Astrphysics - PennState Astrstatistics Schl July, 2013 Vainu Bappu Observatry, Kavalur Crrelatin and Regressin Rahul Ry Indian Statistical Institute, Delhi. Crrelatin Cnsider a tw
More informationOn Huntsberger Type Shrinkage Estimator for the Mean of Normal Distribution ABSTRACT INTRODUCTION
Malaysian Jurnal f Mathematical Sciences 4(): 7-4 () On Huntsberger Type Shrinkage Estimatr fr the Mean f Nrmal Distributin Department f Mathematical and Physical Sciences, University f Nizwa, Sultanate
More informationPhysics 2010 Motion with Constant Acceleration Experiment 1
. Physics 00 Mtin with Cnstant Acceleratin Experiment In this lab, we will study the mtin f a glider as it accelerates dwnhill n a tilted air track. The glider is supprted ver the air track by a cushin
More informationFlipping Physics Lecture Notes: Simple Harmonic Motion Introduction via a Horizontal Mass-Spring System
Flipping Physics Lecture Ntes: Simple Harmnic Mtin Intrductin via a Hrizntal Mass-Spring System A Hrizntal Mass-Spring System is where a mass is attached t a spring, riented hrizntally, and then placed
More informationCESAR Science Case The differential rotation of the Sun and its Chromosphere. Introduction. Material that is necessary during the laboratory
Teacher s guide CESAR Science Case The differential rtatin f the Sun and its Chrmsphere Material that is necessary during the labratry CESAR Astrnmical wrd list CESAR Bklet CESAR Frmula sheet CESAR Student
More informationB. Definition of an exponential
Expnents and Lgarithms Chapter IV - Expnents and Lgarithms A. Intrductin Starting with additin and defining the ntatins fr subtractin, multiplicatin and divisin, we discvered negative numbers and fractins.
More informationParagraph 1: Introduction
Editr s Name: Authr s Name: Date: Argument Essay EDITING WORKSHEET SPECIAL DIRECTIONS FOR EDITORS: ANY TIME YOU MARK NO ON THIS WORKSHEET, BE SURE TO ALSO MARK THIS ON THE WRITER S ACTUAL PAPER/ESSAY WITH
More informationAP Statistics Notes Unit Two: The Normal Distributions
AP Statistics Ntes Unit Tw: The Nrmal Distributins Syllabus Objectives: 1.5 The student will summarize distributins f data measuring the psitin using quartiles, percentiles, and standardized scres (z-scres).
More informationChapter 3: Cluster Analysis
Chapter 3: Cluster Analysis } 3.1 Basic Cncepts f Clustering 3.1.1 Cluster Analysis 3.1. Clustering Categries } 3. Partitining Methds 3..1 The principle 3.. K-Means Methd 3..3 K-Medids Methd 3..4 CLARA
More informationEric Klein and Ning Sa
Week 12. Statistical Appraches t Netwrks: p1 and p* Wasserman and Faust Chapter 15: Statistical Analysis f Single Relatinal Netwrks There are fur tasks in psitinal analysis: 1) Define Equivalence 2) Measure
More informationLesson Plan. Recode: They will do a graphic organizer to sequence the steps of scientific method.
Lessn Plan Reach: Ask the students if they ever ppped a bag f micrwave ppcrn and nticed hw many kernels were unppped at the bttm f the bag which made yu wnder if ther brands pp better than the ne yu are
More information(Communicated at the meeting of January )
Physics. - Establishment f an Abslute Scale fr the herm-electric Frce. By G. BOR ELlUS. W. H. KEESOM. C. H. JOHANSSON and J. O. LND E. Supplement N0. 69b t the Cmmunicatins frm the Physical Labratry at
More informationMODULE FOUR. This module addresses functions. SC Academic Elementary Algebra Standards:
MODULE FOUR This mdule addresses functins SC Academic Standards: EA-3.1 Classify a relatinship as being either a functin r nt a functin when given data as a table, set f rdered pairs, r graph. EA-3.2 Use
More informationInterference is when two (or more) sets of waves meet and combine to produce a new pattern.
Interference Interference is when tw (r mre) sets f waves meet and cmbine t prduce a new pattern. This pattern can vary depending n the riginal wave directin, wavelength, amplitude, etc. The tw mst extreme
More informationA Quick Overview of the. Framework for K 12 Science Education
A Quick Overview f the NGSS EQuIP MODULE 1 Framewrk fr K 12 Science Educatin Mdule 1: A Quick Overview f the Framewrk fr K 12 Science Educatin This mdule prvides a brief backgrund n the Framewrk fr K-12
More informationDetermining the Accuracy of Modal Parameter Estimation Methods
Determining the Accuracy f Mdal Parameter Estimatin Methds by Michael Lee Ph.D., P.E. & Mar Richardsn Ph.D. Structural Measurement Systems Milpitas, CA Abstract The mst cmmn type f mdal testing system
More informationPreparation work for A2 Mathematics [2018]
Preparatin wrk fr A Mathematics [018] The wrk studied in Y1 will frm the fundatins n which will build upn in Year 13. It will nly be reviewed during Year 13, it will nt be retaught. This is t allw time
More informationAP Literature and Composition. Summer Reading Packet. Instructions and Guidelines
AP Literature and Cmpsitin Summer Reading Packet Instructins and Guidelines Accrding t the Cllege Bard Advanced Placement prgram: "The AP English curse in Literature and Cmpsitin shuld engage students
More informationChemistry 20 Lesson 11 Electronegativity, Polarity and Shapes
Chemistry 20 Lessn 11 Electrnegativity, Plarity and Shapes In ur previus wrk we learned why atms frm cvalent bnds and hw t draw the resulting rganizatin f atms. In this lessn we will learn (a) hw the cmbinatin
More information" 1 = # $H vap. Chapter 3 Problems
Chapter 3 rblems rblem At 1 atmsphere pure Ge melts at 1232 K and bils at 298 K. he triple pint ccurs at =8.4x1-8 atm. Estimate the heat f vaprizatin f Ge. he heat f vaprizatin is estimated frm the Clausius
More informationarxiv:hep-ph/ v1 2 Jun 1995
WIS-95//May-PH The rati F n /F p frm the analysis f data using a new scaling variable S. A. Gurvitz arxiv:hep-ph/95063v1 Jun 1995 Department f Particle Physics, Weizmann Institute f Science, Rehvt 76100,
More informationHubble s Law PHYS 1301
1 PHYS 1301 Hubble s Law Why: The lab will verify Hubble s law fr the expansin f the universe which is ne f the imprtant cnsequences f general relativity. What: Frm measurements f the angular size and
More informationPreparation work for A2 Mathematics [2017]
Preparatin wrk fr A2 Mathematics [2017] The wrk studied in Y12 after the return frm study leave is frm the Cre 3 mdule f the A2 Mathematics curse. This wrk will nly be reviewed during Year 13, it will
More informationSection 5.8 Notes Page Exponential Growth and Decay Models; Newton s Law
Sectin 5.8 Ntes Page 1 5.8 Expnential Grwth and Decay Mdels; Newtn s Law There are many applicatins t expnential functins that we will fcus n in this sectin. First let s lk at the expnential mdel. Expnential
More informationThe Law of Total Probability, Bayes Rule, and Random Variables (Oh My!)
The Law f Ttal Prbability, Bayes Rule, and Randm Variables (Oh My!) Administrivia Hmewrk 2 is psted and is due tw Friday s frm nw If yu didn t start early last time, please d s this time. Gd Milestnes:
More informationA Polarimetric Survey of Radio Frequency Interference in C- and X-Bands in the Continental United States using WindSat Radiometry
A Plarimetric Survey f Radi Frequency Interference in C- and X-Bands in the Cntinental United States using WindSat Radimetry Steven W. Ellingsn Octber, Cntents Intrductin WindSat Methdlgy Analysis f RFI
More informationSPH3U1 Lesson 06 Kinematics
PROJECTILE MOTION LEARNING GOALS Students will: Describe the mtin f an bject thrwn at arbitrary angles thrugh the air. Describe the hrizntal and vertical mtins f a prjectile. Slve prjectile mtin prblems.
More informationResampling Methods. Chapter 5. Chapter 5 1 / 52
Resampling Methds Chapter 5 Chapter 5 1 / 52 1 51 Validatin set apprach 2 52 Crss validatin 3 53 Btstrap Chapter 5 2 / 52 Abut Resampling An imprtant statistical tl Pretending the data as ppulatin and
More informationChapter 8: The Binomial and Geometric Distributions
Sectin 8.1: The Binmial Distributins Chapter 8: The Binmial and Gemetric Distributins A randm variable X is called a BINOMIAL RANDOM VARIABLE if it meets ALL the fllwing cnditins: 1) 2) 3) 4) The MOST
More informationIN a recent article, Geary [1972] discussed the merit of taking first differences
The Efficiency f Taking First Differences in Regressin Analysis: A Nte J. A. TILLMAN IN a recent article, Geary [1972] discussed the merit f taking first differences t deal with the prblems that trends
More informationPhysics 212. Lecture 12. Today's Concept: Magnetic Force on moving charges. Physics 212 Lecture 12, Slide 1
Physics 1 Lecture 1 Tday's Cncept: Magnetic Frce n mving charges F qv Physics 1 Lecture 1, Slide 1 Music Wh is the Artist? A) The Meters ) The Neville rthers C) Trmbne Shrty D) Michael Franti E) Radiatrs
More informationMethods for Determination of Mean Speckle Size in Simulated Speckle Pattern
0.478/msr-04-004 MEASUREMENT SCENCE REVEW, Vlume 4, N. 3, 04 Methds fr Determinatin f Mean Speckle Size in Simulated Speckle Pattern. Hamarvá, P. Šmíd, P. Hrváth, M. Hrabvský nstitute f Physics f the Academy
More information**DO NOT ONLY RELY ON THIS STUDY GUIDE!!!**
Tpics lists: UV-Vis Absrbance Spectrscpy Lab & ChemActivity 3-6 (nly thrugh 4) I. UV-Vis Absrbance Spectrscpy Lab Beer s law Relates cncentratin f a chemical species in a slutin and the absrbance f that
More informationAssociated Students Flacks Internship
Assciated Students Flacks Internship 2016-2017 Applicatin Persnal Infrmatin: Name: Address: Phne #: Years at UCSB: Cumulative GPA: E-mail: Majr(s)/Minr(s): Units Cmpleted: Tw persnal references (Different
More informationIntroduction to Spacetime Geometry
Intrductin t Spacetime Gemetry Let s start with a review f a basic feature f Euclidean gemetry, the Pythagrean therem. In a twdimensinal crdinate system we can relate the length f a line segment t the
More information1 The limitations of Hartree Fock approximation
Chapter: Pst-Hartree Fck Methds - I The limitatins f Hartree Fck apprximatin The n electrn single determinant Hartree Fck wave functin is the variatinal best amng all pssible n electrn single determinants
More informationCOMP 551 Applied Machine Learning Lecture 4: Linear classification
COMP 551 Applied Machine Learning Lecture 4: Linear classificatin Instructr: Jelle Pineau (jpineau@cs.mcgill.ca) Class web page: www.cs.mcgill.ca/~jpineau/cmp551 Unless therwise nted, all material psted
More informationLab #3: Pendulum Period and Proportionalities
Physics 144 Chwdary Hw Things Wrk Spring 2006 Name: Partners Name(s): Intrductin Lab #3: Pendulum Perid and Prprtinalities Smetimes, it is useful t knw the dependence f ne quantity n anther, like hw the
More informationLHS Mathematics Department Honors Pre-Calculus Final Exam 2002 Answers
LHS Mathematics Department Hnrs Pre-alculus Final Eam nswers Part Shrt Prblems The table at the right gives the ppulatin f Massachusetts ver the past several decades Using an epnential mdel, predict the
More information