A RATIONALE FOR AN ASYMPTOTIC LOGNORMAL FORM OF WORD-FREQUENCY DISTRIBUTIONS. John B. Carroll

Size: px
Start display at page:

Download "A RATIONALE FOR AN ASYMPTOTIC LOGNORMAL FORM OF WORD-FREQUENCY DISTRIBUTIONS. John B. Carroll"

Transcription

1 R ~ ~8 R L ~ ~ TI A RATIONALE FOR AN ASYMPTOTIC LOGNORMAL FORM OF WORD-FREQUENCY DISTRIBUTIONS Jhn B. Carrll RB-69-9 N This Bulletin is a draft fr interffice circulatin. Crrectins and suggestins fr revisin are slicited. The Bulletin shuld nt be cited as a reference withut the specific perml ssln f the authr. It is autmatically superseded upn frmal publicatin f the material. Educatinal Testing Service Princetn, New Jersey Nvember 1969

2 A RATIONALE FOR AN ASn1PTOTIC LOGNORMAL FORM OF WORD-FREQUENCY DISTRIBUTIONS Jhn B. Carrll Educatinal Testing Service Abstract The lgnrmal distributin has been fund t fit wrd-frequency distributins satisfactrily if accunt is taken f the relatins between ppulatins and samples. A ratinale fr an asympttic lgnrmal distributin is derived by suppsing that the prbabilities at the ndes f decisin trees are symmetrically distributed arund.5 with a certain variance. By the central limit therem, the lgarithms f the cntinued prducts f prbabilities randmly sampled frm such a distributin wuld have an asympttically nrmal distributin. Tw mathematical mdels incrprating this ntin are develped and tested; in ne, the number f factrs in the cntinued prducts is assumed t be fixed, while in the ther, that number is dependent upn a Pissn distributin. Psychlinguistic prcesses crrespnding t these mdels are pstulated and illustrated with reference t tw sets f data: the stimulus LIGHT, and (2) the Lrge Magazine Cunt. (1) wrd assciatins t Reasnable fits t bserved data r t underlying lgnrmal distributins are btained but there remain certain prblems in estimating parameters.

3 A RATIONALE FOR AN ASYMPTOTIC LOGNOm~ FORM OF WORD-FREQUENCY DISTRIBUTIONS l Jhn B. Carrll Educatinal Testing Service A number f writers (e.g., Herdan, 196; Hwes, 1964; Smers, 1959) have pinted ut that the distributin f wrd frequencies in cntinuus texts can be apprximated by a lgnrmal distributin. In previus publicatins (Carrll, 1967, 1968) I have shwn that if a lgnrmal mdel is assumed: (a) the frm and parameters f the distributin fr a finite sample will deviate systematically frm thse assumed fr an infinite ppulatin f tkens; (b) the theretical ppulatin f wrd-types is rdinarily finite in magnitude; and (c) the relatin between the number f types and the number f tkens (i.e., the type-tken functin) can be expressed analytically. Further, I shwed that fr several well-knwn sets f data, the lgnrmal mdel gives excellent fit if accunt is taken f the relatin between the theretical ppulatin and a finite sample--a relatin that previus investigatrs had nt taken int accunt. There are at least tw ther mdels that have been prpsed fr wrd-frequency distributins: Mandelbrt's (1961) elabratin f the laws f Estup and Zipf, and Simn's (1957) skew prbability distributin. Mandelbrt's and Simn's mdels are in each case based n a ratinale that attempts t explain the genesis f the distributin in terms f lgical r psychlgical cnsideratins. Thus far, n detailed ratinale fr a lgnrmal mdel has been prpsed, and it is the purpse f the present paper t remedy this lack. It is hped that in this way ne will be in a better psitin t chse amng the varius cmpeting mdels. The ratinale that I prpse here attempts t exhibit cnnectins with psychlinguistic thery, in the sense that it attempts t thrw light n sme f the pssible psychlgical prcesses invlved in the emissin f wrds in spken r ~Titten cmpsitin r in such tasks as the wrd-assciatin test (where the SUbject is presented 'l'lith a stirrmlus wrd and asked t give "the first wrd that

4 -2- cmes t mind"). A pssible starting pint is suggested by certain mathematical theries cncerning the genesis f lgnrmal distributins. Aitchisn and Brwn (1957) have reviewed these theries, which g back t the wrk f Kapteyn (193). Smers (1959), Mirn and Wlfe (1964), and DiVesta (1964) have ref'erred t Aitchisn and. Brwn's discussin with special reference t certain wrd-frequency distributins, but have nt explicated in detail hw Kapteyn's mdel might apply t such distributins. In the ratinale t be presented here, it turns ut that it will nt be necessary t refer t Kapteyn's wrk; instead, the ratinale will be based n a multiplicative analgue f the central limit thery. Further, this develpment predicts that wrd-frequency distributins will be nly asympttically lgnrmal in frm. Fr the rientatin f the reader~ a brief intrductin t certain characteristics f wrd-frequency distributins will nw be given. The terms ~ and tken will be explained, and the lgnrmal analysis f wrd-frequency distributins will be illustrated fr a cncrete example. Type and Tken Distributins Russell and Jenkins (1954) cllected data n wrd-assciatin respnses f 18 cllege students t each f 1 wrds appearing in the Kent-Rsanff Wrd Assciatin Test. Each subject was asked t give the (ne) first wrd that came t mind as each wrd was presented in printed frm. One f the wrds in the list was LIGHT; fr this wrd, data were available fr 15 ~s. The mst frequent respnse was DARK~ given by 647 ~s; the next mst frequent was LAMP, given by 78 ~s; etc. There were 57 different wrds that were each given by nly ne S- wrds such as LOOK, SHINE~ HIGH~ OUT~ etc. The data can be arranged in a frequency distributin--mre precisely, a distributin f the frequency f with which wrds h ccurred with a given ccurrence frequency (h), as shwn in Table 1. The sum f the frequencies (Lf h) is the ttal number f different wrds, r types, in the

5 -3- Insert Table 1 abut here data; let this number (92, in this case) be designated n. If we nw multiply each ccurrence frequency (h) by its frequency (f h), we btain the number f tkens (number f respnses) accunted fr by each ccurrence frequency; summing these, we arrive at the ttal number f respnses (tkens) in the data, designated N. (Thus, N = Ef h = 15 fr the data in Table 1.) h We can nw apply varius statistical measures and techniques fr describing these data. The basic variable here, it shuld be emphasized, is the ccurrence frequency (h), defined as the frequency with which a given type (r wrd-class)2 ccurs in a cllectin f N tkens. We can cnvert the ccurrence frequency t a prprtinal ccurrence frequency, p', by dividing h by N, that is, and, in turn, it is useful fr certain purpses t take the lgarithm f p'; we thus define ~' = lg p'. (Primes dente that the statistics are based n a sample. Lgarithms t the base 1 are emplyed unless therwise nted.) In effect, we can think f tw distributins f the variable h (r f its transfrms pi r ~,): ne is the distributins f types (with frequencies f h), the ther is the distributin f tkens (with frequencies fhh). It is pssible t apply rdinary statistical methds t btain the mments f either the type distributin r the tken distributin, but we will have n use fr such statistics. Rather, we will shw immediately the methd f graphical representatin that has been used by the writer and his predecessrs t suggest that a lgnrmal mdel may prvide a reasnably gd fit t the data. We assume, first, that the ccurrence frequency (h) reflects an underlying cntinuus variable f wrd prbability; thus, the frequency given fr h = 1, fr example, may be regarded as the frequency fr the interval 1/2 < h < 1 1/2. We therefre cmpute p' and the crrespnding ' fr the upper bundaries f each interval, as sh~m in the furth

6 -4- and fifth clumns f Table 1. We then cmpute the cumulative prprtinal type and tken frequencies (cumulating frm the least frequent types) and find their crrespnding nrmal deviates using tables f nrmal curve areas; Kelley's (1938) table is particularly useful fr this since the argument is the nrmal curve area frm.1 t.9999 in steps f.1. (In practice, the nrmal deviate values are fund by a cmputer algrithm.) The resulting values are shwn in the last fur clumns f Table 1. We nw plt the nrmal deviates against the upper-bundary-f-interval values f ', t btain the pints shwn by pen circles in Figure 1. The upper set f pints is fr the type distributin; Insert Figure 1 abut here the lwer set is fr the tken distributin. The fact that these sets f pints are apprximately rectilinear has been taken as evidence fr a lgnrmal frm f the distributin, since the lgnrmal frm wuld indeed yield pints fllwing a perfectly straight line. (Similar plts can be btained by pltting cumulative prprtins directly against frequencies n lgarithmic prbability paper; fr varius reasns I find it mre cnvenient t cnvert frequencies t lgarithms f prprtins and t cnvert cumulative prprtins t nrmal deviates, dispensing with the use f prbability paper.) Nevertheless, in my previus papers (Carrll, 1967, 1968) I presented mathematical develpments shwing that even if ne assumes that the mdel is lgnrmal in the ppulatin, sample data will nt in general have a strict lqgnrmal frm; rather, the plts fr sample type and tken distributins will tend t deviate systematically frm this frm. The fitted lines fr bth type and tken distributins wuld be expected, in general, t have a slight curvature. By methds essentially the same as thse utlined previusly-(carrll, 1968), the parameters f the theretical lgnrmal ppulatin underlying the data fr assciatins t

7 -5- LIGHT (Table 1 and Figure 1) may be estimated t be ~ = t = , and ~ = , all in the metric f the variable ~, the cmmn lgarithm f r the prprtinal ccurrence frequency in the ppulatin (the analgue f the variable <l>' mentined abve).:3 Here, IJ. is the mean f the theretical tken distributin, IJ. is the mean f the theretical type distributin, and (J is the standard T deviatin f each f these distributins, which by lgnrmal thery have the same variance (Aitchisn &Brwn, 1957 t Therem 2.6). The lgnrmal plts crrespnding t these parameters are shwn as slid lines in Figure 1; the predicted type and tken distributins fr a sample f 15 tkens are shwn as brken lines. The fit between predicted and bserved distributins is excellent bth by visual standards and by X 2 tests made with judicius selectin f class bundaries, as shwn in Table 2. Insert Table 2 abut here Anther illustrative analysis is shwn in Figure 2, which pertains t data assembled frm the Lrge Magazine Cunt (Thrndike & Lrge, 1944) by Dr. Davis Hwes (persnal cmmunicatin). The fit between predicted and bserved data is again Insert Figure 2 abut here excellent; the predicted curves, in fact, shw the same curvature tendencies as thse fr the bserved data. The actual bserved and predicted frequencies fr the type and tken distributins are shwn in Table 3; as judged by a X 2 test, the fit is nt gd, very large frequencies being invlved. Hwever, it will be nted that the ratis f bserved and predicted frequencies are generally clse t unity. The cmputatins fr this set f data used the "delete-ne-half-wrd- Insert Table 3 abut here

8 -6- type" prcedure mentined but nt actually carried ut in a previus article (Carrll, 1968, pp ); this prcedure appears t yield a slightly better fit than the prcedure that uses the cmplete right-hand tail f the theretical type distributin, and we shall mentin, belw, a pssible reasn why this may be s. The parameters fr this case (shwn in Figure 2) are, f curse, quite different frm thse fr the wrd-assciatin data. With this intrductin, we pass t the develpment f a ratinale. First, we will develp the mathematical basis; subsequently, we attempt t link the mathematical mdel with certain language phenmena. Finally, we apply the ratinale t the tw empirical sets f data that have been presented as illustratins. Mdel I Actually, tw slightly different mathematical ratinales will be presented. Bth depend upn what Aitchisn and Brwn (1957, Therem 2.8) call a multiplicative analgue fr the Lindberg-Levy frm f the additive central limit therem f mathematical statistics: "If {x.} is a squence f independent, psitive variates J having the same prbability distributin and such that 2 m bth exist, then the prduct IT x. is asympttically j=l J 4 in ntatin have been made. ] E{lg x.} = u and Var{lg x.} = J J dlstributed. as A(2) m~, m. " [Slight changes This means that the lgarithm f the m prduct IT x. is distributed (asympttically) nrmally, but with mean and variance j=l J equal t the mean and variance f {lg x.} multiplied (respectively) by a cnstant m. J m lgarithm f the prduct as t, that is, t = lg ( IT x j) = j=l Let Z j = lg xj' and dente the m m E (lg x. ) = E Z., Then {t } j=l J j=l J assume that real nninteger values f m are permissible as well is distributed (asympttically) nrmally. We will as integer values (m > 1); fr any such value) a part f lg X(m"+l) prprtinal t the fractinal part f m wuld be added int the sum) where m' is the integral part f m. (Hencefrth) the SUbscript j will be drpped.)

9 -7- It will be bserved that the rati f the variance t the mean is the same fr the initial distributin {z} and the terminal distributin it}, and that if means and variances are knwn fr bth the initial and the terminal distributins the value f m can be determined: E{t} Vari t } m = --- E{ z } Var-] z } (1) Fr cnvenience, the negative rati f the variance t the mean will be dented P l : Vadt } E{t} Var'{z I E{z} We will nw als define the rati P2' t be used in Mdel II: 2 E{z } E{z } The ratinale we shall develp requires that x is symmetrically distributed such that < x < 1 and E {x} = 1/2, with its variance being a single-valued functin f a parameter v (v > ). Many such distributins culd be cnstructed, but we find it advantageus t cnsider a distributin derived frm a variable y which is distributed rectangularly in the interval < y < 1, such that v-i 1 1 v v ~ 1/2) x = 2 y... ( < Y v-i 1 v v 1 1 (1 - y) (1/2 < Y < x = 1) } (4) The density functin f this transfrmed variable is } ( < x.2. 1/2) (1/2 < x < 1)

10 -8- and its definite integral frm t x is x v-i v J D(x)dx 2 x ( ::;: < x ~ 1/2) } x v-i v J D{x)dx ::;: 1-2 (1 - x).. (1/2 < x < 1). (6) Figures 3 and 4 sketch the density and integral functins f {x} fr selected values Insert Figures 3 and 4 abut here f v. Thus, x may vary frm smething like a U-shaped distributin t an inverted-v distributin; when v ~, values f x are limited t.5. An intermediate frm is the rectangular distributin when v ::;: 1. (Appendix A gives a cmputer prgram fr determining the density and integral functins fr any real psitive value f v.) The frm selected fr the distributin {x} is admittedly arbitrary, but its exact frm prbably des nt matter substantially in view f the peratins perfrmed n it (see belw). Of interest are the expected mean and variance f z (::;: In x) as a functin f v (where In dentes the natural lgarithm). It can be shwn 5 that these are given as fllws, fr any real psitive value f v: 1 1 E{z} ::;: -1 [ In v L '2 v r=l r(v + r)2 r 1 1 v C E{z2} 1 (In 2)2 + --In r = L 2 2 v v 2 r=2 2 r (v + r) ( 8) where fr r 2, 3, 4, C ::;: -( ), r 2 3 r-l r whence, f curse, Var-j z } ::;: 2 2 E{ z } - [ E{ z l ].

11 -9- Appendix C includes a prgram fr cmputing E{z}, E{z2}, Var{z}, and the ratis PI = -Var{z}/E{z} and P 2 = f v (r a sequence f such values). 2 - E{z }/E{z} fr any real psitive value The series-apprximatins invlved in these cmputatins are cnsidered cnverged t satisfactry values when the. -6 r -th value f a term gverned by a summ~tin sign is less than 1 The prgram als permits iterative cmputatin f v and m fr given values f E{t} and Var{t} n the basis f Mdel I. Appendix D gives tables f E{z}, E{z2}, Var{z}, PI and P2 fr the range f v that has been fund t be pertinent fr the develpment t fllw. Mdel II Mdel I assumed that each quantity in the terminal distributin it} was btained by determining the sum f m independent values sampled randmly frm the distributin {z}. As was nted previusly, m was permitted t take nnintegral values. Suppse nw, hwever, that m itself is a randm variable, such that each quantity in the terminal distributin, designated {u} in this Mdel II, is btained as the sum f sme randm number, m + 1, f values sampled randmly frm the distributin {z}. Fr this case, we allw m t take nly integral values (m =, 1, 2,... ). If {r} is a randm distributin, it culd take any desired frm. Fr simplicity and cnvenience, hwever, we will assume that it is described by a Pissn distributin with a single pa~ameter A (which may be any real number ~ ), where A = E{m} = Var{m} (1) The prprtinal frequencies fr m =, 1, 2,... are given by the respective terms f the infinite series (which sums t unity) -A e [1, A, A 2/2!, ),3/3!,.., Am/m!,... ] (11)

12 -1- When m =, nly ne value is sampled frm the distributin {z} t frm a value in the terminal distributin {u}, The randm variable {m} is thus in effect a measure f hw many additinal values are sampled frm {z} beynd the first, and summed t frm each value f {u}, Because f the central limit therem that has been utilized in Mdel II as in Mdel I, it may be presumed that the distributin {u} will have an asympttic nrmal frm as A ~ m, The expected mean and variance f the terminal distributin {u} as functins f the mean and variance f the initial distributin {z} can be fund by applying the central limit therem t successive fractins f the initial distributin {z}, each fractin being weighted by its prprtinal frequency frm (11). By this prcedure, it can be shwn (Appendix E) that E{u} = (A + 1) E{z} (12) and Var{u} = ( A + l)var{z} 2 + AE {z }, If E{u} and E{z} are given, it is clear that E{u} A = - 1 ' E{z} (14) Further, it can be shwn (Appendix E) that fr any value f A, 2 Vadu} + E {z } E{u} = E{z} The negative value f this rati is what has previusly been defined as O 2, that is, = E{z} 2 Var{z} + E {z} E{z} Therefre, given the mean, U, and variance ~~ f a terminal distributin {u}, the maximum likelihd value f the parameter v f the initial distributin {z} can be determined by finding the value f v that makes equatin (15) true, substituting U

13 -11-2 fr E{u} and S fr Var i u}, That is, ne lcates the value f v such that u (s + E {z})/u is equal t the value f P2 that crrespnds t v. A u reasnable estimate f A can then be determined by equatin (14), substituting U fr E{u}. Appendix D cntains tables f P2 as a functin f v fr the distributin {z} described previusly. Appendix C is a cmputer prgram that permits determining, by an i terative interplati n prcedure, v and A fr given values f the mean, U, and variance, 52, f a terminal distributin n the basis f Mdel II. u This cmpletes the expsitin f tw slightly different mathematical mdels that will be emplyed in an attempt t accunt fr the asympttic lgnrmal frm f wrd frequency distributins. 'I'he Ratinale The ratinale will first be presented in smewhat intuitive frm, the mathematical features being intrduced nly subsequently. Let us cnsider the data frm the wrd-assciatin respnses t the -stimulus "LIGHT." It is pssible t speculate n the mtivatin fr any given respnse. Fr example, the subject wh respnds with DARK has evidently interpreted the stimulus as an adjective having a meaning assciated with an aspect f visual experience; fr ne reasn r anther, he gives anther adjective as his respnse, and he gives an adjective whse meaning is ppsite t that f LIGHT. A subject wh respnds with HEAVY interprets the stimulus as an adjective having t d with kinesthetic experience; like the first subject, hwever, he gives an adjective with an ppsite meaning. A subject wh gives the respnse LAMP may have interpreted the stimulus as a nun meaning a "surce f light," and he is presumably set t give anther nun as his respnse--a nun which cites a specific example f an

14 -12- artificial surce f light. Or he may have interpreted the stimulus as a verb meaning rughly "t set burning" r "kindle"; he is set t give as his respnse a nun that can be the bject f such a verb, as in "t light the lamp." I have attempted t make a detailed analysis f the respnses f 11 subjects in the Russell-Jenkins data fr "LIGHT" in terms f the kinds f stimulus interpretatins and respnse prcessing invlved in each case. (Fur f the 15 respnses seemed difficult t interpret.) My analysis is, f curse, purely subjective and speculative, but it cnsists f a series f hyptheses that presumably culd be verified by certain peratins. The results f my analysis are exhibited in the frm f a "tree structure," shwn in Figure 5. Insert Figure 5 abut here As may be seen by inspecting the chart, I pstulate that in respnding t a stimulus such as LIGHT, a subject acts as if he perfrms a series f "decisins." These decisins may be classified int at least fur kinds:7 1. Semantic interpretatin f the stimulus 2. Grammatical interpretatin f the stimulus 3. Grammatical prcessing f the respnse t the stimulus 4. Further prcessing f the respnses t the stimulus in terms f varius semantic and ther factrs. These fur types f decisins are shwn in different levels f the chart; the uppermst level f the chart, hwever, cmbines bth semantic and grammatical interpretatins f the stimulus. Starting at the tp middle with the stimulus LIGHT, we.may suppse that the first "decisin" the subject must make is whether t interpret LIGHT as a wrd having t d with visual experience r with kinesthetic experience (weight). Frm my analysis, the prprtin f s making the frmer interpretatin is.965. At the next level, we are cncerned with what part f speech the stimulus

15 -13- is parsed in. Let us assume that all the "decisins" are binary. Of the Ss wh cnstrue LIGHT as having t d with visual experience, abut 72% take it t be an adjective. Of the 27.9% cnstruing it as ther than an adjective, 82.7% interpret it as a nun. As a nun, LIGHT has at least fur basic meanings: (1) a metaphrical cncept analgus t TRUTH, (2) a physical cncept (light waves r particles); (3) the specific manifestatin f light as the illuminatin f daylight r mnlight; and (4) an bject that gives ff artificial light. Binary decisins that culd lead t these fur meanings are shwn in the chart. At the next level f the chart, we assume that the subject has interpreted the stimulus in terms f its semantics and grammar; he then "prcesses" it in terms f the type f grammatical respnse he will emit. Fr the purpses f the chart, I assume that the respnse will be either paradigmatic (same part f speech) r syntagmatic (different part f speech). predminant thrughut the chart. It will be seen that paradigmatic respnses are But varius types f syntagmatic respnses may ccur-~frming a nun phrase, chsing a verb r an adjective, making a nun cmpund, etc. The f'ur-t.her "semantic" prcessing f the respnses is shwn in the next level f the chart, e.g., giving a "cntrast" respnse vs. a "nncntrast" respnse, giving a "direct" vs, a "crdinate" respnse, etc. Smetimes it is pssible t make rather detailed classificatins f these "semantic prcessing" respnses; fr example, twards the left-hand side f the chart, syntagmatic respnses t LIGHT cnstrued as an adjective having t d with visual experience may be classified int thse which give "things characterized" (e.g., hair) and thse that identify clrs either in general terms (clr, shade) r in specific terms (red, yellw, green, etc.). Even these clrs (red, yellw, green, etc.) might be further classified in a binary way t yield bxes f terminal respnses cntaining nly ne type, but this has nt been attempted.

16 -14- Sme f the ndes in the "semantic prcessing" phases f Figure 5 may crrespnd t the peratin f the "ididynamic sets" in wrd assciatin that have been identified by Mran, Mefferd and Kimble (1964). Fr example, a type f nde that ccurs quite frequently, particularly just belw paradigmatic decisins in the grammatical prcessing phase, is that which divides int "cntrast" and "nncntrast" branches; Carrll, Kjeldergaard and Cartn (1962) have shwn that individuals differ cnsistently with respect t the prbability that they will make "cntrast" r linncntrast" respnses in a wrd-assciatin task. The reader may fllw the varius paths t terminal respnses t apprehend and evaluate the hyptheses underlying the tree structure. Tw pints deserve mentin. (1) Sme respnses have been classified in tw paths; fr example, ne f the mre frequent respnses, lamp, is assigned t tw terminal respnse psitins, first as a paradigmatic respnse t LIGHT meaning a surce f light, and secnd as a syntagmatic respnse t LIGHT as a verb. In such cases, the bserved frequencies have been divided in tw equal parts. (2) I have allwed fr a few terminal respnses that did nt actually appear in the data, merely t shw that such respnses might have appeared. Thus, kindle might have ccurred as a paradigmatic respnse t LIGHT as a verb meaning "set burning." The 92 different respnses that appeared in the Russell-Jenkins data are nly a small sample f thse that might have appeared. Accrding t the frmula N = exp [ 12] 2(ln lola - ~, 1 where N is the number f tj~es in the theretical ppulatin by lgnrmal thery, 1 2 and ~ and a are in terms f cmmn lgarithms, N is estimated t be 8594 fr the T wrd-assciatin data; that is, sme 8594 different respnses might eventually have been btained if sampling had been cntinued indefinitely. This tree structure f wrd assciatins t LIGHT is ffered nt as a clearly verified analysis but as a pssible realizatin f a mdel fr the generatin f an

17 -15- asympttically lgnrmal distributin. It des nt matter whether the classificatins I have made are crrect; what matters is that it is pssible t make such classificatins, and mre imprtantly, that the prbabilities f chice as given at the ndes f the tree structure tend t be distributed in sme randm fashin. Of necessity, the prbabilities f chice will be symmetrically distributed arund a 5-5 split; sme f them may be extreme splits, e.g.,.999 vs..1. The prbabilities f chice shwn at the ndes f the tree structure in Figure 5 are prbabilities f tkens fr the given path. Fr example,.965 f all respnses in the data appear t be respnses t LIGHT as having t d with visual experience, and.35 appear t be respnses t LIGHT as having t d with weight. The final tken prbability f each terminal respnse can be regarded as a cntinued prduct f the prbabilities at the ndes f the path leading t that respnse. Fr example, the prbability f the respnse dark r black (ttal frequency = 652.5, tken prbability =.6518) is the cntinued prduct.965 x.721 x.975 x.962 x.998. We can begin t see nw hw an asympttic lgnrmal distributin f tken prbabilities might arise. By reference t the multiplicative analgue f the central limit therem mentined earlier, it appears that if x is a variable distributed ver m the interval < x < 1, the lgarithm f the prduct TI x. will be distributed asympj=l J ttically nrmally. The "chice prbabilities" shwn at the ndes f the tree structure in Figure 5 cnstitute the randm variable x f the mathematical mdel, and they are distributed symmetrically because fr each prbability p there is a crrespnding prbability (1 - p). T prvide a further cnnectin with the mathematical mdels presented earlier, let us assume, in fact, that the chice prbabilities are distributed as specified by equatin (4). We wuld expect, then, that the ppulatin tken distributin wuld be asympttically lqgnrmal. By Mdel I, the "depth," m, f the tree structure wuld be a cnstant; that is, each terminal respnse wuld have a prbability yielded

18 -16- by the cntinued prduct f the prbabilities at the m ndes f the path leading t that respnse (all paths being f equal length). T apply the mdel, we wuld start with the estimated values f the mean and variance f a terminal distributin {t}, the theretical distributin f tken prbabilities in the ppulatin. By finding the value f v crrespnding t P l f equatin (2), a suitable distributin {x} fr the initial prbabilities culd be fund, and by equatin (1) the value f m culd be determined. By Mdel II, n the ther hand, it wuld be assumed that the "depth" f the tree structure wuld nt be unifrm, sme paths being lnger than thers. Such an assumptin seems mre reasnable than that f cnstant depth, since wrds vary n a generality-specificity dimensin. Again starting with the estimated values f the mean and variance f the terminal distributin f tken prbabilities in the ppulatin, we wuld find a suitable distributin f x by finding the value f v crrespnding t P t make equatin (15) true, and determine 2 the average value f m (as defined fr Mdel II) by use f equatin (14). A similar line f reasning culd apply t type distributins. The prbabilities in the tree structure wuld be assumed t be calculated nt in terms f the number f tkens (respnses) yielded by a particular path at a given nde, but in terms f the number f types s yielded. Fr example, in Figure 5, the number f types yielded by the "Visual Experience" branch f the first nde is 8, r.8 f the 1 types shwn in the terminal respnse bxes, while the number f types yielded by the "Weight" branch. is 2, r.2 f the ttal number f types. (These, hwever, are nly estimates f the crrespnding prbabilities in the ppulatin.) These prbabilities als culd be regarded as cnstituting a randm variable x with sme value f the parameter v. Parallel peratins with the frmulatins f Mdels I and II culd then be carried ut. Tests f the Mdels The tw sets f illustrative data presented previusly will nw be analyzed t test the fit f Mdels I and II t the data. The general prcedure is as fllws:

19 -17- (1) Since the mdels apply mre immediately t the theretical ppulatins than t the bserved distributins, the parameters v and m r A are estimated (by prcedures develped in the mathematical sectin f this paper) frm the theretical ppulatins that in turn had been estimated by lgnrmal prcedures (Carrll, 1968). The cmputer prgram in Appendix C is used fr this purpse; the prgram in Appendix A cmputes the density and integral functins f {x} fr the value f v. (2) By the use f either Mdel I r Mdel II, terminal distributins are generated by Mnte-Carl prcedures in which the prbability f ccurrence f a terminal prbability (actually, its lgarithm) is independent f the terminal prbability itself. In this way we avid the sampling cnsideratins that arise in drawing finite samples f types r tkens frm terminal distributins, where the ccurrence f a type r tken is ~ frtiri dependent upn its theretical prbability. By generating large numbers f values fr the terminal distributins, with N equal t either 1, r 5,, thse distributins will have shapes clsely cngruent with thse that wuld be generated by letting N apprach infinity. The cmputer prgram fr the Mnte-Carl generatin f terminal distributins is shwn in Appendix F, with apprpriate dcumentatin. (3) The fit between the mdel-generated terminal distributins and the theretical lgnrmal distributins is evaluated by graphical inspectin. It is nt expected, hwever, that the fit will necessarily be clse, because the mdel-generated distributins are expected t be nly asympttically nrmal. (4) When pqssible, a further cmputer prcedure gives the expected type and tken distributins fr N tkens sampled frm the mdel-generated type distributins, N being identical t the number f tkens in the bserved data. The sample type and tken distributins s generated are then cmpared with the bserved distributins by graphical inspectin. The cmputer prgram fr drawing

20 -18- these finite samples frm terminal type distributins is shwn in Appendix G with assciated dcumentatin. The thery by which finite samples f tkens are drawn frm theretical ppulatin distributins is the same as published previusly (Carrll, 1967, 1968), except that in the present case the theretical ppulatin type distributins are nly asympttically lgnrmal. Because f special cnsideratins that must be intrduced with the data fr wrd assciatins t LIGHT, we will start with the analysis f the data frm the Lrge Magazine Cunt, n the assumptin that the mdels presented apply nt nly t wrd assciatin respnses but als equally well t w&rds in cntinuus discurse. Lrge Magazine Cunt Data The parameters estimated fr the theretical lgnrmal distributin were v = , ~ = , = 1.496,all in the metric f ~, the cmmn lgarithm T f the prbability, n, f the ccurrence f a given type in the theretical ppulatin tken distributin. Since the frmulatins in the mathematical sectin f this paper are in natural lgarithms, the crrespnding values in natural lgarithms are given as fllws: ~ = ~ = T a = Type distributin, Mdel T. Frm the abve, PI can be immediately cmputed 2 as - I~ = / = By use f the cmputer prgram in Appendix C, T ne can find that the value f v yielding this rati is 1.367, and that the crrespnding parameters f the distributin {z} are E{z} = , Var{z} = The density distributin f x is shwn in Figure 6. By equatin (1), m = By use f the cmputer prgram in Appendix F, a terminal type distributin was

21 -19- Insert Figure 6 abut here generated frm these values fr 5, types; this is pltted in histgram frm in Figure 7 and in cumulative prprtinal frequency frm (with lgnrmal crdinates) in Figure 8, with the theretical lgnrmal distributins shwn fr cmparisn. The Insert Figures 7 and 8 abut here terminal distributin has a slight negative skewness; this is pssibly assciated with the fact that its upper bund is ~ = and its lwer bund appraches negative infinity. Nevertheless, it is sensibly clse t a nrmal distributin. Serius prblems are encuntered in the attempt t use the Mnte-Carl-generated terminal type distributin as a basis fr drawing a finite sample f tkens fr cmparisn with bserved data. (1) The generatin f the terminal distributin gives its apprximate shape in terms f prprtinal frequencies fr intervals f ~, but it des nt indicate hw many types there are in the theretical terminal distributin. In the Mnte-Carl generatin, the number f types is an input parameter (5, being used in the present case). If we knew the crrect number f types, N, and were t use that as an T input parameter, the expected sum f the prbabilities 'IT crrespnding t the values f t, where 'IT = exp(t), wuld equal unity. (2) Despite the very large number f values generated, the shape f the upper tail is likely t be smewhat unreliable since values near ~ = are very infrequent and imprbable. (This is almst paradxical because it is precisely these values, as prbabilities, that are largest.) Yet the shape f the upper tail must be determined as accurately as pssible in rder t determine the frequencies f the mst frequent types. With respect t the first prblem, it may be remembered that the number f types in the lgnrmal type distributin is a functin f its parameters ~ and 2 It is nt wise t asswne that this value als applies t the asympttic lgnrmal

22 -2- distributin generated under Mdel I, particularly in view f the negative skew f the terminal distributin and the cnsequently inflated number f types f lw prbability existing in the lwer tail. Therefre, sme expedient must be develped. As was bserved abve, the sum f the prbability values f N 1" types shuld equal unity. Therefre, when we generate N such quantities, the sum f the prbabilities culd be regarded as prprtinal t unity as N is prprtinal t N. T Fr example, the sum f the 5, type prbabilities generated under Mdel I happened t be.45971; thus it wuld be estimated that if 5/ = type values had been generated, their sum wuld be unity. Unfrtunately, this reasning fails t take accunt f the unreliability f the upper tail f the terminal distributin. The last value in the cumulative frequency distributin can vary widely by chance. Hwever, the general frm f the cumulative frequency distributin shuld be reasnably stable; if the respective values were multiplied by sme cnstant w, they shuld apprximate the cumulative prprtinal frequencies f the nrmal curve. Then Nw shuld be an estimate f N. T T frmalize this: let the cumulative prbability at the upper bund f an interval k in the Mnte-Carl-generated tken distributin be dented a k; then find the number w by which all ~ shuld be multiplied such that I k ~ k,w a minimum, (16) where ~k,w is the nrmal deviate crrespnding t w~ in the nrmal distributin, and ~k = (~k - ~w)/ is a nrmal deviate in the distributin N{~w,2}, ~k being the value f ~ fr the upper bund f the interval k. ~ can either be set equal t ~ w f the lognrmal tken distributin, in which case (16) attains a certain minimum value, r allwed t vary until (16) attains an even smaller minimum. ( culd be allwed t vary in the sam~ way, but such variatin has been fund nt t be prfitable.) Because f the skewness f the Mnte-Carl type distributin, nly intervals

23 -21- in the central and right-hand prtins are used fr these calculatins. The summatin is taken frm k = 12 up t the last interval cntaining a cumulative type frequency value less than unity, where k is scaled s that k = the integral part f (22 + 4~ ) in the distributin shwn, and ~ is the nrmal deviate in the type L L distributin. Appendix H is a cmputer prgram fr minimizing (16). Fr the present data, when we fix ~ = ~ = , we find w = 1.95, and cnw sequently 5w = 547,5 is the estimated number f types in the theretical distributin. When ~ is allwed t vary, (16) reaches a minimum when ~ = , w w w = 16.4, giving 5w = 82, estimated types. These estimated values are t be cmpared with the value n t = estimated frm lgnrmal thery. Which f the estimates is mre nearly crrect is unknwn, but the lwer estimate yields a theretical tken distributin mre nearly apprximating the lgnrmal tken distributin. The tken distributin thus prjected, with w = 1.95, is pltted in histgram frm in Figure 7, and in cumulative prprtinal frequency frm in Figure 8, where it is seen that the distributin is apprximately a straight line when pltted n lgnrmal crdinates. The slight dwnward bend at the upper end is pssibly due t the unreliability f the upper tail and the very lw prbability f extremely high values f. An attempt was then made t predict the sample type and tken distributins that wuld result fr a sample f tkens n the assumptin f the type distributin generated by the Mnte-Carl prcedure. This invlved fitting the curve f Figure 8 Ivith a series f plynmial equatins fr = f (~ d)' then r. determining the number f types in the theretical distributin fr intervals f 4,. At first, the lwer estimate f the number f types was tried (547,5), but since this seemed t lead t certain incnsistencies the larger estimate was tried (82,). 1he resulting theretical distributin f types was submitted t the cmputer prgram (shwn in Appendix G) fr delinea~ing the predicted sample type and tken distributins. The results are pltted in Figure 9. Unfrtunately, the attempt was

24 -22- Insert Figure 9 abut here nt very successful; it led t a much inflated estimate f the number f types in the sample. Als the lines fr bth the predicted type and tken distributins are cnsiderably abve the psitins f the crrespnding lines fr the bserved data. Further trials, with different estimates f the ttal number f types and different ways f fitting the theretical type distributin curve, were nt cnsidered prfitable at this juncture. (The cmputatins are fairly expensive.) What is ntable abut the results shwn in Figure 9 is that the predicted sample curves have shapes cmparable t thse f the bserved data (as pltted in Figure 2), even thugh their parameters are apparently wrng. This is evidence fr the suppsitin that the bserved data might arise frm type distributins generated under Mdel I f ur ratinale. Tken distributin, Mdel I. Althugh the calculatins just given fr types included prjected tken distributins, Mdel I can als be applied directly t tkens t make mre precise the prbable shape f the tken distributin. set U Fr this case we 2 = ~, s u 2 =a Using the data previusly given, we have P l 2 = - /~ = / ; frm this, v is fund t be.658, E{z} = , Var{z} = The density distributin f the crrespnding variable {x} is pltted in Figure 6. By equatin (1), m = A terminal distributin fr 5 tkens was generated by the Mnte-Carl prcedure and its plt is included in Figure 8. This distributin has a much mre prnunced skew than that f the generated type distributin. On the lgnrmal plt, the line is cnsiderably mre curved than ne wuld expect fr an asympttic nrmal distributin. Furthermre, the line tends t miss the lcatin f the bserved data (which fqr values f frm abut -14 t -1 in natura] lgs are quite clse t the lgnrmal line as may be seen in Figure 2). Nevertheless, the curviness f the upper prtin f the line gives what is prbably an accurate

25 -23- impressin f the shape f the theretical tken distributin fr this regin, since it has an upper bund f ~ = and therefre the curve must eventually curve up t psitive infinity at that bund. N attempt was made t derive a predicted sample tken distributin frm these data; the prgram used fr such derivatins requires infrmatin n the theretical type distributin. Type distributin, Mdel II. Since the quantity _( )/~T =.6647 was less than , a value f v that wuld make equatin (15) true des nt exist fr this case. It was nticed, hwever, that a slight adjustment f the parameter ~ T wuld permit the value f v t be infinity, a value fr which all values in the distributin {x} are.5. It was felt that adjustment f the parameter ~ was justifiable T because f prblems in estimating it by lgnrmal thery. If we fix Var{u} at its bserved value f , we can slve equatin (15) fr E{u} at the minimum value f the equatin when v is infinity, namely (see ftnte 4, page 34). E{u} is then , nt very different frm the riginally estimated value f This yields ~ = , als nt very different frm the riginal value f By equatin (14), A = A special variatin was intrduced int the cmputer prgram f Appendix F t permit Mnte-Carl generatin f a Mdel II distributin based n an initial distributin f {x} where all x =.5. The resulting distributin, shwn in Figure 1, was generally cngruent with the lgnrmal distributin. (The Insert Figure 1 abut here irregularity is mainly due t the fact that the lgarithms f certain pwers f.5 happened t fall unequally int the class intervals used. The upward bend at the extreme lwer tail is due t small numbers f cases; t save cmputer time, nly 1 types were generated.)

26 -24- In view f the generally linear frm f this distributin, n attempt was made t predict sample type and tken distributins fr tkens t cmpare with bserved data. It may be expected rather cnfidently that the predicted sampie distributins wuld be clse t the bserved data. Fr the Lrge Magazine data, at least, Mdel II yields a clse fit t the lgnrmal frm. The nly prblems are thse f estimating parameters. Tken distributin, Mdel II. If v is taken t be 1.514, equatin (15) is true 2 fr E{u} set equal t ~ ~ and Var{u} set equal t a ; P2 = 1.912, E{z} = , Var{z} = By equatin (14), A = (-6.779/-.9797) - 1 = A terminal distributin {u} fr 5, tkens was generated by Mnte-Carl techniques; the results are pltted in Figure 1 with the theretical lgnrmal tken distributin shwn fr cmparisn. This curve has a prnunced skewness and the lwer prtin may be unrepresentative f the true distributin. The curviness f the upper prtin, hwever, gives what is prbably an accurate indicatin f the shape f the upper tail f the tken distributin. Wrd Assciatin Data (Assciatins t "LIGHT") Frm the lgnrmal fitting f the data, the estimated parameters f the lgnrmal distributin were as fllo\vs, in bth cmmn and natural lgarithms: Cmmn lgs Natural lgs u j..!t a Type distributin, Mdel I. Frm the abve, P = - /j..! :::.9595, 'J = 1.3, 1 T E{z} = , Varl z } =.91179, m > The density distributin f {x} is pltted in Figure 11. Results f a Mnte-Carl generatin f the terminal distributin f 5, types are shwn in Figure 12. TIlis distributin is similar t that

27 -25- Insert Figures 11 and 12 abut here fr the Lrge Magazine data, Types, Mdel I in that it has cnsiderable negative skewness and the curve pltted n lgnrmal crdinates has cnsiderable departure frm the lgnrmal. Calculatins similar t thse described fr the crrespnding case fr the Lrge data were carried ut fr the prprtinal frequencies yielded by this terminal distributin. Fr ~ fixed at ~ = , w w.3, yielding 5w = 15 estimated types; fr ~ allwed t vary, ~ = , w =.7, yielding v w 35 estimated types. Since the frmer value gave a curve clser t the lgnrmal tken distributin (and the bserved data), that curve is pltted in Figure 12 (the pints represented by "x "}. It shuld be bserved, hwever, that even the highest pint reaches nly the value ~ = because f the unreliability f the upper tail f the terminal distributin; this prjected tken distributin is nt very infrmative, and it may nt be accurate even in the lwer prtin because f the assumptins made in its cmputatin. Because f the many prblems in estimatin, n attempt was made t develp predicted sample type and tken distributins n the basis f the Mnte-Carl-generated type distributin. Tken distributin, Mdel I. Using the lgnrmal parameters given abve, it 2 was fund that PI = - /~ = , v =.622, E{z} = -8.44, Var{z} = , m =.838. Hwever, this value f m seemed unreasnable, being much less than unity and therefre implying fractinal parts f {z}. It was realized that the theretical distributin f tkens, even thugh based n an underlying lgnrmal distributin with the parameters stated abve, has an unusual shape in that the mst frequent type has the very high prbability value f.6438 and hence cvers the area ver the range t + (in terms f nrmal deviates f the tken distributin). Because f the very brad categry fr the mst frequent type, the lgnrmal parameters may nt

28 -26- have reflected the actual mean and variance f this distributin. Calculating the mean and variance n the basis f the individual type (lgarithmic) prba- bilities, ne f1nds. th e mean = and variance (in the metric f natural lgarithms). Frm these latter values, ne finds P l = 3.146, v =.487, E{z} = , Var{z} = , m = , a smewhat mre reasnable value than btained abve. The density distributin f {x} is pltted in Figure 11. The Mnte-Carl-generated terminal distributin, with N = 5, is shwn in Figure 12 (the curve labeled "Tken Distributin, Recmputed Parameters"). This curve has prnunced skewness and des nt apprximate the lgnrmal line very well; nevertheless it cincides with the sample data fairly well. Type distributin, Mdel II. Using the riginal data fr the theretical lgnrmal type distributin, we find that the rati _( )/~ is greater than T , and therefre a value f v exists that will make equatin (15) true. (In the cmparable case fr the Lrge data, it will be remembered, v did nt exist.) This value f v is ; the assciated parameters are E{z} = ~, Var{z} =.1739, P 2 =.9933, and by equatin (14), A = These values culd have been used fr generating a terminal distributin by the Mnte-Carl technique, but the value f A seemed t be t large t be reasnable fr the wrd-assciatin data. Instead, an extraplatin was made frm the recmputed values fr the theretical tken distributin (see abve). 2 By lgnrmal thery, ~T= ~ - (when values are in natural lgarithms). A new value f ~ was therefre fund t be (2.546)2 T = There is sme questin as t whether this extraplatin was justified in view f the fact that the recmputed values were based n an irregularly-gruped distributin. Nevertheless, by this prcedure, parameter values were yielded as fllws: v = , A = , E{z} = , Var{z} =.693, P2 =.826. It will be nticed that with this value f v, the distributin f initial prbabilities {x} tends t be extremely peaked at x =.5, verging twards an apprximatin f the kind f distributin btained with the Lrge Magazine Cunt data, Types, Mdel II, with v = infinity (see

29 -27- Figure 11). A Mnte-Carl-generated terminal distributin based n these ~alues, with N =1,, is pltted in Figures 13 and 14 in histgram and cumulative prprtin frm, respectively. Althugh this line shws sme curvilinearity, it is a fair apprximatin t a lgnrmal distributin. Insert Figures 13 and 14 abut here Again, because f prblems f estimatin, n attempt was made t derive frm this theretical type distributin the predicted sample type and tken distributin fr N = 15 t investigate match with the bserved data, but it will be seen that the theretical type distributin is in the neighbrhd f the bserved sample type distributin. Tken distributin, Mdel II. Althugh the riginal parameters f the data are such as t suggest that a value f v exists, it eluded the iterative cmputatins that were iesigned t determine it. Apparently it appraches zer. Using the recmputed data fr the theretical lgnrmal tken distributin, we find v =.4561, E{z} = , Var{z} = , A = The average number f values frm x whse lgarithms are summed t prduce {u} is therefre (A + 1 ) = , a value clsely apprximating that fund under tkens, Mdel I. The density distributin f {x} is pltted in Figure 11, and a Mnte-Carl-generated terminal distributin, N = 1,, is shwn in Figures 13 and 14. This fits the bserved data slightly better than the Mdel I curve even thugh it is a theretical ppulatin distributin rather than a predicted sample distributin. Fr the range f values f ~ represented by the bserved data, sample values shuld be very clse t ppulatin values. Discussin The tests f the mdels are in sme respects far frm satisfactry because it prved difficult t estimate all the parameters needed t prduce predicted

30 -28- distributins that culd be directly cmpared t sample data. Further wrk must be dne n the prblem f estimating the number f types implied by a given theretical distributin. Als, wrk culd be dne n develping refinements f the mdels presented here, r similar mdels. Anther distributin functin fr the initial prbabilities might be mre satisfactry than the ne used here. Despite the difficulties that have turned up, it nevertheless seems clear that sme frm f asympttic lgnrmal generating functin wuld give adequate fits t bserved data. These functins wuld embdy in any case the basic principle emplyed in develping the present mdels, namely that the terminal distributins f prbabilities represent cntinued prducts, accrding t sme rule, f randmly distributed initial prbabilities. Of the tw mdels analyzed here, Mdel II seems t give smewhat better fits t the lgnrmal distributin, a distributin t which bserved data cnfrm well. Als, Mdel II seems mre reasnable in terms f the ratinale presented here, fr it permits a variable depth. Crrespnding t variable depth wuld be the phenmenn whereby sme wrds (e.g., thing, d, gd, etc.) have a larger cverage f pssible referents and therefre wuld be reached quite early in the "decisin tree," whereas thers (e.g., gyrscpe, calumniate, salutary) have much mre exact meanings and therefre wuld be reached less sn in a decisin tree. Althugh this article presents analyses f nly tw sets f data, sme speculatin can be ffered as t the meaning f tw parameters, v and m r ~ that have been utilized in Mdel I and Mdel II. It may be emphasized that all the Mnte-Carlgenerated distributins presented here depend nly n thse parameters, given the algrithms that are used fr the generatin prcess. T examine their meanings, a summary table has been prepared f all the parameter values derived in this study, Table 4. Insert Table 4 abut here

31 -29- In cmparing the "depth" parameter m f Mdel I with that f Mdel II, :x., it is useful t increase :x. by 1, s that bth m and (:x. + 1) represent the (mean) number f factrs in the cntinued prducts represented by the terminal distributins. CJearly, m r (:x. + 1) are measures f the degree t which, n the average, the types r tkens in the terminal distributins represent finely divided meanings; m r (:x. + 1) can thus be taken as indicatrs f vcabulary size. As such, they are mre readily interpretable in cnnectin with type distributins than with tken distributins. The average "depth" f types in a decisin tree will be greater than the average depth f tkens because in a tken distributin the latter is highly biased twards the mre frequent types. In the present data, the Lrge Magazine Cunt distributin always has greater depth than the wrd assciatin distributin when cmparable methds f estimating depth are used. The parameter v, n the ther hand, refers t the "diversity" f the vcabulary distributin. It als reflects its negative entrpy r infrmatin. When v is lw, the distributin f initial prbabilities is peruted near and 1; thus decisins at ndes f the decisin tree are highly determined. This case is exemplified by the wrd assciatin data. When v is large, the decisins are less highly determined, mst f them being near.5 in prbability. The resulting terminal distributin is relatively mre "diverse" fr a given depth, in the sense that the prbability that any tw tkens sampled frm the distributin will be different is higher. Presumably, there wuld be a cnnectin between v and Yule's (1944) "characteristic!s.," als vri 'ch Gd's (1953) index f repetitin, but these cnnectins have nt been investigated. Again, v is mre interpretable with type distributins than tken distributins because it cntrls the distributin f the initial prbabilities that generate the terminal prbabilities. In fact the whle thery utlined here is cnceptually mre tied t type than t tken distributins, the latter being regarded as derivates f the type distributins.

32 -3- It was mentined earlier that the "delete-half-wrd-type" prcedure (Carrll, 1968, pp ) applied in lgnrmal fitting f data seems t give mre satisfactry results. The prbable reasn fr this is that the distributin f lgarithmic prbabilities has an upper bund at zer (where 7T = 1). The deletin f ne-half wrd type frm the upper tail f the type distributin, and a crrespnding adjustment f prbabilities s that they will sum t unity, apparently tends t cmpensate fr the truncatin f the nrmal curve at ~ = O. In the case f wrd assciatin data where there is a strng "primary" frequency and where the distributin f lgarithmic tken prbabilities is extremely skewed (as may be seen in Figure 13), an even mre drastic adjustment prcedure is necessary, namely, t fix the prbability f the mst frequent type in the theretical type distributin at its value in the bserved data, n the (generally reasnable) assumptin that the bserved value is a gd estimate f the ppulatin prbability. The ratinale ffered here may have sme relatin t certain aspects f psychlinguistic thery. (1) Sme f the ndes in the tree structure generating terminal prbabilities may crrespnd t semantic and ther types f features differentiating wrds. Wrds with lwer type prbabilities will in general have a large number f these features. (2) The theretical distributin f types may represent a persn's r a grupis "cmpetence" (i.e., wrds knwn but nt necessarily used), while the bserved distributins f types and tkens reflect "perfrmance" (wrds actually used, with their frequencies). Yet the present thery suggests that cmpetence and perfrmance are intimately bund, since the distributin f types is measured in terms f tken prbabilities. It is as if a prbability f usage is attached t an individual's knwledge f a wrd. (3) The thery is related t the general prblem f the manner in which lexical items "segment reality." At least sme f the ndes in the pstulated decisin tree may represent classificatins f reality. The number f classificatins must be limited, since the measures f depth seldm exceed sme relatively small number.

33 -31- The present ratinale might appear t have n relatin t theries f grammar, since the same distributinal phenmena wuld hld even if all the wrds in a text were scrambled. Indeed, the reader may wnder hw this ratinale culd be applied equally well t wrd assciatin data, where the wrds are individually cllected frm a grup f subjects, and t cntinuus texts such as thse analyzed by Lrge. The ratinale has t d with the cnditins under which an individual wrd is selected. As each wrd in a cntinuus text is selected, the cnditins change; different structures and different semantic and grammatical features are invlved. Likewise, each subject in the wrd assciatin task emits his respnse under a different pattern f cnditins. The ndes f the "decisin tree" prpsed here are intended t represent the cnsiderable number f factrs r cnditins that may apply t the selectin f a given wrd. Many f these features wuld be f a grammatical character; in fact, sme grammatical prcessing f respnses was assumed in the analysis f the wrd assciatin data. Nevertheless, the ratinale can bviusly give n leads as t what grammatical r semantic features in a decisin tree there are; it merely attempts t mtivate, in a statistical and prbabilistic manner, the end result f wrd-selectin prcesses.

34 -32- References Aitchisn, J., & Brwn, J. A. C. The lgnrmal distributin. Cambridge: Cambridge University Press, Carrll, J. B. On sampling frm a lgnrmal mdel f wrd-frequency distributin. In H. Kucera & W. N. Francis. Cmputatinal analysis f present-day American English. Prvidence, R. I.: Brwn University Press, Pp Carrll, J. B. Wrd-frequency studies and the lgnrmal distributin. In E. M. Zale (Ed.), Prceedings f the Cnference n Language and Language Behavir. New Yrk: Appletn-Century-Crfts, Pp Carrll, J. B., Kjeldergaard, P. M., & Cartn, A. S. Number f ppsites versus number f primaries as a respnse measure in free-assciatin tests. Jurnal f Verbal Learning and Verbal Behavir, 1962, ~, DiVesta, F. J. The distributin f mdifiers used by children in a wrd-assciatin task. Jurnal f Verbal Learning and Verbal Behavir, 1964, 3, Gd, I. J. The ppulatin frequencies f species and the estimatin f ppulatin parameters. Bimetrika, 1953, 4, Herdan, G. Type-tken mathematics: A textbk f mathematical linguistics. 'S-Gravenhage: Mutn, 196. Hwes, D. Applicatin f the wrd-frequency cncept t aphasia. In A. V. S. de Reuck & M. O'Cnnr (Eds.), Ciba Fundatin Sympsium n Disrders f Language. Lndn: Churchill, Pp Kapteyn, J. C. Skew frequency curves in bilgy and statistics. Grningen: Nrdhff, 193. Kelley, T. L. TI1e Kelley statistical tables. New Yrk: Macmillan, Mandelbrt, B. On the thery f wrd frequencies and n related Markvian mdels f discurse. In R. Jakbsn (Ed.), Structure f language and its mathematical aspects. Prvidence, R. I.: American Mathematical Sciety, (Prceedings f Sympsia in Applied Mathematics, Vl. XII.) Pp

35 -33- Mirn, M. S., &Wlfe, S. A crss-linguistic analysis f the respnse distributins f restricted wrd assciatins. Jurnal f Verbal Learning and Verbal Behavir, 1964, 1, Mran, L. J., Mefferd, R. B.,. & Kimble, J. P. Ididynamic sets in wrd assciatin. Psychlgical Mngraphs: General and Applied, 1964, 78 (2, Whle N. 579). Russell, W. A., & Jenkins, J. J. The cmplete Minnesta nrms fr respnses t 1 wrds frm the Kent-Rsanff Wrd Assciatin Test. Minneaplis, Minn.: University f Minnesta, Department f Psychlgy, Simn, H. A. Mdels f man. New Yrk: Wiley, Smers,H. H. " "./ Analyse mathematig,ue de langage: lis generales et mesures statistig,ues, I. Luvain, Belgium: Nauwelaerts, Thrndike, E. L., & Lrge, I. The teacher's wrd bk f 3, wrds. New Yrk: Bureau f Publicatins, Teachers Cllege, Clumbia University, Yule, G. U. The statistical study f literary vcabulary. Cambridge: Cambridge University Press, 1944.

36 -34- Ftntes IThis research was supprted by the Natinal Institute f Child Health and Human Develpment, under Research Grant 1 POI HD1762. I express thanks t Walter Kristf and Ry Freedle, wh prvided helpful cmments n an early draft f this paper, and t Miss Barbara Witten, wh assisted in cmputatin and data analysis. 2I n speaking f a type r wrd-class, we d nt specify hw a wrd-class is defined. In many wrd-cunts, the type is defined as a particular spelled forid; e.g., like, likes, liked, liking are fur separate wrd-classes, regardless f the meaning r grammatical part f speech f the frm. In ther wrd-cunts, the entries are "lemmatized"; that is, wrd-classes are differentiated by meaning and/r part f speech. In such a cunt, the wrd classes might include, fr example, like (nun, including plural and pssessive frms), like (verb, including mrphlgically related frms such as likes, liked, liking), like (adjective), like (prepsitin), etc. As far as is knlfn, the present ratinale may apply equally well regardless f hw the wrd-classes are defined; the nature f the wrd-classes may, hwever, affect the parameters f the distributins. 3Fr this case, the value f the prprtinal frequency f the mst frequent type in the theretical tken distributin was fixed t equal its prprtinal frequency in the bserved data. 4Fllwing the cnventin f Aitchisn and Brwn, the ntatin {x} will frequently be used t mean "the distributin f x ;" The ntatins E{x} and Var-i x} refer t "the expected value f x" and "the expected variance f x." The ntatin A(ml1, m 2) means "a lgnrmal distributin with mean mj..l and variance m 2." 51 am indebted t Dr. Rger J. Owen, Visiting Research Fellw at Educatinal Testing Service during the year , fr these derivatins, which are given in Appendix B.

37 -35-6 As v ~ s P2 reaches its minimum values 2 [-In(.5)] = Therefre s the quantity than in rder fr v t exist. 2 -In(.5) = s and E {z} = 2 - -(su )/U must be greater 7ne might als suggest a fifth kinds prir even t semantic interpretatin f the stimulus s which wuld represent the decisin t emit a "clang" r rhyming assciatin rather than t cnsider the stimulus as a meaningful wrd.

38 Table 1 Illustrative Wrd-Frequency Data: Distributin f Occurrence Frequencies f Wrd-Types in the Russell-Jenkins Data fr Assciatins t "LIGHT," with Cmputatins Leading t Lgnrmal Graphical Representatin Occurrence Number f Prprtinal lg p' Type Tken Frequency Frequency Tkens Frequency at = <jl' Distributin Distributin (h) (f h) ( fhh) Upper Bundary f Interval Cum. Nrmal Cum. Nrmal p' = (h + 1!2)!N Prp. Deviate Prp. Deviate , LV \ I Sums = n = N

39 Table 2 Chi-Squared Tests f Fit between Predicted and Observed Type and Tken Distributins, Data fr Assciatins t LIGHT Type Distributin (Frequencies) h Observed Predicted d.f. = 2 a 2 X = P > Tken Distributin (Frequencies) h Observed Predicted d.f. = 2 a 2 X =.1828 p >.9 ~r the type distributin, the fitting invlved, in essence, a cmmn mean and variance but nt a cmmn number f types; thus d.f. = 4 categries - 2 parameters = 2. Fr the tken distributin, the fitting invlved cmmn N, mean, and variance, als a cmmn value fr h = 647; thus, d.f. = 6 categries - 4 parameters = 2. (Because f varius runding errrs, the actual predicted value fr h = 647 was rather than 647, but the difference is trivial.)

40 -38- Table 3 Fit between Predicted and Observed Type and Tken Distributins, Lrge Magazine Cunt Data Type Distributin Tken Distributin h Observed Predicted Observed h Observed Predicted Observed Predicted Predicted

41 -39- Table 4 Summary f Parameters Estimated fr Tw Sets f Data by Tw Asympttic Lgnrmal Mdels TyPe Distributins Tken Distributins Mdel I (cnstant depth) v m v m Lrge Magazine Cunt Wrd Assciatins t ITLIGHT" Original estimates Recmputed estimates Mdel II (Pissn depth) v A + 1 v A + 1 Lrge Magazine Cunt () (25.7 ) Wrd Assciatins t "LIGHT" Original estimates Recmputed estimates

42 -4- Figure Captins Fig. 1. Lgnrmal plt f data fr wrd assciatins t LIGHT, with theretical type and tken distributins in the ppulatin, type and tken distributins predicted fr a sample with N = 15, and the bserved type and tken distributins, N = 15. Fig. 2. Lgnrmal plt f data fr the Lrge Magazine Cunt, with theretical type and tken distributins in the ppulatin, type and tken distributins predicted fr a sample with N = 4,591,122, and the bserved type and tken distributins, N = 4,591,122. Fig. 3. Density functins f x fr selected values f v. Fig. 4. Integral functins f x fr selected values f v. Fig. 5. Fig. 6. Tree structure f wrd assciatins t LIGHT. Density distributins fr several values f v estimated fr the Lrge Magazine Cunt. Fig. 7. Histgram (a) fr Mnte-Carla-generated type distributin, Mdel I, based n parameters estimated fr the Lrge Magazine Cunt, N = 5,, (b) fr tken distributin prjected frm the type distributin n the assumptin f 547,5 types. Nrmal distributin curves are prvided fr cmparisn in each case. Fig. 8. Lgnrmal plt fr several Mnte-CarIe-generated distributins and prjected distributins based n parameters estimated fr the Lrge Magazine Cunt, Mdel I. Fig. 9. Results f an attempt t predict sample type and tken distributins frm Mnte-Carla-generated ppulatin distributins based n parameters estimated fr the Lrge Magazine Cunt, Mdel I. Fig. 1. Lgnrmal plt fr several Mnte-Carla-generated distributins based n parameters estimated fr the Lrge Magazine Cunt, Mdel II.

43 ~41- Fig. 11. Density functins f x fr several values f ~ estimated fr the wrd assciatins t LIGHT. Fig. 12. Lgnrmal plt fr Mnte-Garl-generated distributins based n parameters estimated fr the wrd assciatins t LIGHT, Mdel I, including bserved data in the sample and a tken distributin prjected frm the generated type distributin. Fig. 13. Histgrams f Mnte-Carl-generated type and tken distributins based n parameters estimated fr the wrd assciatins t LIGHT, Mdel II, recmputed estimates. Fig. 14. Lgnrmal plt fr Mnte-Carl-generated distributins based n parameters estimated fr the wrd assciatins t LIGHT, Mdel II (recmputed estimates).

44 I --...J. :.- -, I I - r3. -e».: ss- i\) ro. r-i.o J Wrd - AssOciatins t "LIGHT" Observed Data (N:: (5) f.l: f.l :: == :: -- r =lg (Wrd PrObability) Theretical Type Distributin Distributin Theretical Taken Distributin Distributin g, J Predicted TOken -2:5 c. "- a. - "- Q).:> 3 "-"- 2.- I 1 / - 1 ) / :J E :J () " Q) 'S a - E l.. Z Q) -1. J Predicted Type I -, / f/iio... '/,*", / -- I - -" :>.

45 v r<i (\J -43- Q T (\J I... "'\ ~ I N I t N I r<i I 'Ii I t <i I \ \ \ \ \ \ \ \ \ ~ I I I, \,, \ \ 1 \ \,, {l. 1'& {, V " \J' ' \ 8, ~.s_ N(\J g,(\j :!;..: ~m e>to 5 -..J V 1 EZ ~~ OC - :> u ' 1 \ %1 \\ ~\ 1 "\ "1 1 \ "1 1 1, \ "i J~ (\JOV r<')w\o -171\ ~~q (\J-CXl I I II 11 u ::t. b :t (\J I t N I r<') I ->.. t - r?.a.a ~ a, '...J u ti ~ I Lri I t Lri I 1O I U.+: e ~ s:... t t I W I It) 1O I..t r<i (\J s ọ. (\J I

46 ~ 3. I V= ' 2. >. -III 1.8 c: IU a V=I.O X

47 ~ +-.6 \Ōl.5 <l>..- C <l>.4 :: c... cv X

48 Figure 5 TREE STRUCTURE OF ASSOCIATIONS TO LIGHT "-11 z I- «I- ljj (l) lr Q.. Z lr Ul ljj lr I- «Q.. Z a (.) z I- «z «:E ljj Ul (l) z en Ul w (.) :.. I ~ '\ I (.) I Z <[ :E wul TERMINAL RESPONSES brl,~~ lfl :; etee r Z 1I~11." 1'- _-''-_--'.2.S 'air I cr.am,... rid ~,.'Iw VT"" 2: blue 2 brlln I tllmp tllllrill 39" I (<lihue Z.S' ~ultl II~.,.~ttla buill I f;eounoath -SAME PART Of S1>E CH * APPEARS IN Z TERMINAL STRINGS - FR[QuENcrES IlDJUSTED

49 /= (Lrge Types, Mdel I) /= e-, -VI.8 c: C1> a I X

50 N -48- Prprtinal Frequency (X) U> v N Q I------'------' <------'-----"" O -.c I... L ~------_----, r--f ~ '---,-,... t...,~..., L_, ~ ~-... L - --,-... L ---" " " '--::;... L_, L..., L, If) \ L 1...:... C NI -S- LO I Q I LO I 1 (\J I ri 1 rt) I U> LO v rt) J\:>uanbaJ.:I (\J

51 3 2 c +- l Q.. Q) -j.~.- :J E -2 :J *- Q) - > -4 Q) a E I i L j LORGE MAGAZINE COUNT ANALYSIS BY MODEL I..A---- Lgnrmal Type Distributin Mnte- Cri-Generated Curve, Type Distributin 7/ = I. 367 m =2.83 N =5!>"': Mnte - Carl Generated Curve, Tkens V =.6528 m= N:.5 ~"4 Lgnrmal Tken Distributin //1-1 i -5 Tken Distributin derived frm Mnte-Cri Type Distributin, assuming Nt" 547, I 4 ~3 /~2 /{/ -I I ~ '- 1-7'"'1 I ".' ~_...' " io -5 6 ep (nt. lgs) I I I I I I I 1 I I ) I ~--'---T--I I II I (cmmn tgs) lie iii iii I -8-7

52 II I I I I I I I I I I I I I I I I ~ 5 4 LORGE MAGAZINE COUNT Fitted Curve, Type Distributin Mdel I 4 c. ~, '- Q Q} ::J E ::J U '- - Q}... 'S; Q} E ' Z 3 2 "...' <1" ".,,.' """1/ -I -2 -t LgnrmaI Ppulatin Type Distributin,. " ,..,...-"...,....,...'.".' ' ~Predicted ';.e......,..,-" »«"e -».' Sample Type Distributin fr N= (Assuming 82, Types in Ppulatin) -,. ". Predicted Sample Tken _.,.... Distributin._., fr N= _ Lgnrm I......" '...-.' D..... Ppu ltin Tken Distributin 3 2 -I -2 I VI, II cp (nat. lgs) I I I I I I I I I I I I, I (cmmn lgs) I

53 I I I I, :J -: t: LORGE MAGAZIN E COUNT.r ANA LYSIS BY MODEL IT 21 /) 2 c: += ~ J Lg nrmal Type Distributin- - Mnte-Carl-Generated Q. Type Distributin Q) ~ I- e a.. N =1, > +=!-j - Mnte- Carl-Generated Tken Distributin Q) N =1, 1--2 "> Q) c -3l ~~nrm.1 Tken Distributin ~-I 1--3 I V1 I--' I J -4 J d> (nat. lgs) -1-5 I I I I I I I I I I I I I I II I (cmmn lgs)

54 ZI= (Types, Mdel m t/» (Types, Mdel I)... >- 'Ci) c C1l ZI=.487 (Tkens, Mdel I).2. I J.O X

55 U r K-R "LIGHT" ASSOCIATIONS MODELl 3 c: -'- a. '-, tv :> - :l 2 Mnte -Carl -Generated Type Distributin (riginal estimates f parameters) Lgnrmal Type Distributin Mnte -Carl-Generated Tken Distributin (recmputed parameters) 2 E :l U... -tv -I.:; tv Sample Dat - 7-:eY I -I I VI w I E Z K---K--K Tken Distributin Prjected Frm Generated Type Distributin -2-3 Distributin -3-3G (nt. lgs) I I IT If I ----T I I I I I I I II I (cmmn lgs)

56 K-R "LIGHT", MODEL IT (recmputed data) -,.2.18 Tken Distributin.16 >. ~.14 ::> ' Q) t.t.i 2 c 't:.1 a. <t.8 Type Distributin I V1 ~ I ~ I i c=r=,...j I I I I ill I /2 -II I (nt. lgs) -2 I I I I I I I ~ I I L..., I J I -8-7 r---' -I r- -- r --,-/ I I ep(cmmn lgs)

57 I I I I I > I 4 c: ~ a. I '"- a, K-R "LIGHT" ASSOCIATIONS MODEL :II (based n recmputed parameters) Lqnrrnl Type"";"7 Q;l :> +-.2 ::l E Mnte-Carl-Generated ::l Ụ / I ~ 3 i " &1 Sample data I VI VI Type Distributin I ~ I (Tkens)! (Types) Q) +- :; Q) a -I;./ f.../dv" 1--1 C Ẹ.. z -2-1./ ~ 1Ken urstnunn 1--2 I ep (nt. lgs) -1-5 I.----r I I T r- I,----, -II I ep (cmmn lgs)

58 APPENDIX A Cmputer Prgram fr Calculating Density and Integral Functins f {x} as Defined by Equatin (4) f the Text This prgram is written in the language knwn as TELCOMP develped by Blt Baranek and Newman, Inc., Cambridge, Mass., and used with their timesharing service accessible by Teletype. The stred prgram has tw "parts" and three "frms." A sample utput is shwn. Material typed by the peratr is underlined. Values f the density and integral functins are btained nly fr a certain range f values f x up t.5. The values beynd x =.5 are symmetric with thse up t.5. If a finer range f values is desired, statement 1.2 can be changed accrdingly. Nrmally the prgram is entered with a statement such as DO PART 1 FOR NU=.5,.6,.8, 1.3 r DO PART 1 FOR NU=.5:.2:.6 the latter prducing tables fr NU=.5,.52,.54,...,.6 This prgram can be easily translated int ther cmputing languages such as FORTRAN.

59 A-2 1.5J PROG.<DENSNU>6DENSITY DISTRIBUTION FOR INITIAL PROBABILITIES 1.6J ENTER "DO PART 1 FOR NU=(LI ST OR SERI S)" 1.7 TYPE # 1~8 TYPE NU IN FORM TYPE '~FORM DO PART 2 FOR X=.1:.1:.1:.5: F=2tCNU-l>.NU.XtCNU-f>,INTGL=2t(NU-l>.XtNU 2.2 TYPE X~F~INTGL IN FORM 2 FORM 1 X F INTGL FORM 2 H.'D HD#.',#I #.,,## FORM 3 NU= #fi.fll#d DO PART 1 FOR NU=.4561 NU= X F INTGL

60 APPENDIX B Expected Mean and Variance f z = In x as Defined by Equatin (4) f the Text Rger J. Owen Nte n ntatin: been replaced by n. Fr cnvenience, the symbl v used in the text has Als, the randm variable y mentined in the text is symblized in this Appendix by x.

61 B-2 Let x be a randm variable unifrmly distributed ver (, 1). Define the randm variable z accrding t n-l - 1 z e = (1) n x n < x 1 < - 2 (B-1) -2 n-l 1 z e 1 - (1) n (1 - x)n 1 = < x < Ntice that z is a cntinuus mntnic increasing functin (even at x x and s the density f Z, p(z) = dx/dz 1 = -) f 2 (B-2) Slving (1) fr x 2 n - 1 enz x = (_ < Z.2. -lg 2)} (B-3) (-lg 2 < z < ) where here and elsewhere all lgs are t base e. Differentiating (B-3) there fllws the density functin f p(z) = n2 n-1 e nz (_ < z :c -lg 2)1 p(z) n-l( z)n-l z = n2 1 - e e (-lg 2 < z < ) J z (B-11) Ntice that p(z) is cntinuus in z (even at z = -lg 2). It is easy t verify that p(z) integrates t lver (-,). We nw want t calculate: E(z) -1 i'log 2 = n2 n (- f \ z=_ ze nz dz + f -lg 2 e Z d~ (B-5)

62 B-3 The first integratin is effected by parts -lg 2 e e! = - f dz z=_lx> n n _ rnz nz r1g 2 ~z - 1 nzllog 2 1 = )e n n _ = - (lg ) ( )n (B-6) n 2 n 2" The secnd integral f (B-5) may be evaluated explicitly fr any real n by means f a transfrmatin. Transfrm accrding t z = lg (1 - ~), a cntinuus mntnic transfrmatin. Then the secnd integral f -lg n-l ~ lg (1 - ~) d~ (B-6' ) Nw lg (1 -~) can be expanded int the abslutely cnvergent series - ~ r=l ~r/r fr ~E(-l, 1) (Whittaker &Watsn, 1962, p. 584), a series which is therefre unifrmly cnvergent fr ~E(-a, a) fr any a < 1 (Brmwich, 1947, p. 145) and in particular is unifrmly cnvergent fr 1.:.. ~ 2.. '2' Hence ~n lg (1 - ~) d~ 1 2 ::: - l: 1:. 1 r=l r ~n+r-l/r and (Whittaker &Watsn, 1962, p. 79), 1 l: r=l r(n + r)2 n+r (B-7)

63 B-4 Frm (B-5), (B-6), and (B-7) E( z ) = - ~ [log 2+ ~ + n; 1] r=l r(n + r)2j:' (B-8) It is f interest nw t calculate _l(log 2 = n2 n f z=_ 2 nz z e dz + f -lg 2 d~ (B-9) The first integral where w = nz, 1 -n lg 2 2 w = - f w e dw n 3 _ n = ""3 [n (lg 2) + 2n lg 2 + 2] ( 2" ) n (B-IO) With the transfrmatin z = lg (1 -~) used befre, the secnd integral 1 2 = f ~n-l[log (1 _ ~)]2 d~ N w 'f b th th '<' arx and brx r 1 e pwer serles ~ ~ cnverge abs1ut e1y r=o r r=o in the interval (-, ), then their prduct is the series

64 B-5 cnvergent in (-, ) (Brmwieh, 1947, p. 154). Hence multiplying the expansin fr lg (1 ~) with itself there fllws the expansin, abslutely cnvergent fr ~E(-l, 1): 2 [lg (1 - t.:)] = L r=2 C ~r r (B-ll ) where fr r = 2,3,..., c = r 1 l(r - 1) 1 2(r - 2) 1 (r - 1)1 (B-12) By the fllwing indirect methd, a simpler Axpressin fr C r is btained. Cnsider the series btained by differentiating the terms f (B-ll): L r=2 rc t;r-l r (B-13) and let R dente its radius f cnvergence. Referring t BrODlwich (1941, 1 ~ = lim (rc r)r-l r-)<x) Nw lg lg 1 C r r 1 (rc )r-l = r as r -+ 1 (lg r + lg C ) r - 1 r and s because f the cntinuity f which is apprximately equal t r lg (.), R- = lim C r, r-ko

65 B-6 and (B-13) and (B-11) have the same radius f cnvergence. It is knwn the radius f cnvergence f (B-l1) is 1. Since (B-13) has a radius f cnvergence f 1, it is unifrmly cnvergent fr ~ (-., a) fr any < a < 1 and the derivative f the sum in (B-ll) is given by the sum f the derivatives (B-13), i.e., _ ~ lg 1 _ ~ = (B-14) Referring t Jlley (1961, p. 22) and cmparing terms (a pwer series has uniquely determined cefficients), That is, 1 2" (r + 1) Cr+1 = l+ l (r 2 r = 1,2,... ) 211 C = - ( ) r r r (r = 2,3,... ) (B-15) Because (B-l1) is unifrmly cnvergent fr ~ ( -a, a) fr any - < 1, c 2 f ~n-l[lg (1 _ ~ )]2 d~ = L C r f r=2 ~n+r-l d~ = c C L r r=2 2 n + r(n + r) (B-16) Substituting (B-I) and (B-l6) in (B-9), = l (lg 2) 2 + l:- lg 2 + l:... + ~ L r 2 n n 2 2 r=2 2 r(n + r) c C (B-17)

66 B-7 and 2 2 Var (z) = E(z ) - [E(z)] (B-18)

67 B-8 Referenc~s.. f.r;...appendix B Brmwich, T. J. I. An intrductin t the thery f infinite series. Lndn: Macmillan, Jlley, L. B. W. Summatin f series. New Yrk: Dver Publicatins, Whittaker, E. T., &Watsn, G. N. A curse f mdern analysis. Cambridge: Cambridge University Press, 1962.

68 APPENDIX C Cmputer Prgram fr Estimating v and m r A frm Given Parameters ~ and ~ f a Terminal nistributin T This prgram is written in the TELCOMP language fr an interactive cmputer (see Appendix A). The stred prgram is given, fllwed by a sample utput. liaterial typed by the peratr is underlined. The several lines with different values f NU reprt steps in the iteratins. The prgram is nrmally entered with the instructin "DO PART 1." The prgram can als be used fr printing a table f functins f NU. Appendix D is such a table, btained by entering the prgram with DO PART 1 FOR NU=.5:.5: 2.5 FOR 1=1 (Hwever, the rw fr NU=.OOOO was typed by hand since the prgram will nt functin fr this value.)

69 C-2 1.5JPROG.<ESTNU> ESTIMATE NUl DEPTH OR LAMBDA for GIVEN MUIMUTAU 1.136; ENTER "DO PART 1" 1.7 TYPE "GIVE MUI MUTAU CNAT.LOGS) & ADD TITLE Of DATA" 1.1 DEMAND MUIMUTAU 1.11 TYPE # 1.12 SIGMA2=MU-MUTAU ISIGMA=SQRTCSIGMA2) 1.2 DO PART 11 for TY=112 for ANS = PRINT "TYPE DISTRIBUTIONI" IF' TY= PRINT "TOKEN DISTRIBUTIONI" IF' TY= TYPE" MODEL I CCONSTANT DEPTH)"I # IF' ANS= TYPE" MODEL II CPOISSON DEPTH)"I # IF' ANS= M=MUTAU IF' TY= M=MU If TY= TO STEP 11.8 IF' CSIGMA )/-M< AND ANS= I=IINU1=.5INU2=1.5ILl=-.313IL2= LN1=.3926ILRl=LOGC-SIGMA2/M)ID1=LRl-LNl IF' ANS=l LN1=.5929ILRl=LOGC-CSIGMA )/M)IDl=LRI-LNI IF' ANS= JTYPE LNIILRIIDI IN form 8; ADD OR OMIT t;' AF'TER STEP # LN2=-.2545ILR2=LRIID2=LR2-LN2 If ANS= LN2=.1533ILR2=LOGC-CSIGMA )/M)ID2=LR2-LN2 IF' ANS= ;TYPE LN21LR21D2 IN form 8; ADD OR OMIT ';' AfTER STEP # V=NU2 INU2=NUI INU1=VIV=L2IL2=LI ILI=V If '1'<'2' V=LN2ILN2=LNIILNl=VIV=LR2ILR2=LRIILRI=V If '1'<'2' V=D2ID2=DI IDl=V If 'Dl '<'2' 11.4 NUEST=1t(L2-D2*CLI-L2)/CDI-D2» DO PART 1 F'OR NU=NUEST LN2=LOG(RATIOl)IDl=D2ILl=L2 If ANS= LN2=LOGCRATI2)ILR2=LOG(-(SIGMA2+EZt2)/M)IDl=D2ILl=L2 If ANS= D2=LR2-LN2IDT=NUEST-NU2INU2=NUESTIL2=LOG(NUEST) 11.58;TYPE LN21LR21D2 IN form 8; ADD OR OMIT ';' AfTER STEP # 11.6 TO STEP 11.4 If 'DT'> DEPTH=M/EZ If ANS= LAMBDA=M/EZ-l If ANS= TYPE MUTAU If TY= TYPE MU If TY= TYPE SIGMA21SIGMAINUIEZIVARZ TYPE RATIOIIDEPTHI# If ANS= TYPE RATI2 ILAMBDAI# If ANS= DONE 11.8 TYPE" NU DOESN'T EXIST F'OR THIS CASE"I# 1.5; COMPUTATION Of E(Z),ECZt2)IDt2CZ)IRATIO FOR GIVEN NU 1.136; ENTER "DO PART 1 for NU=CLIST OR SERIES) for 1=1" 1.7 TYPE FORM 5 IF' 1=1; ADD OR OMIT '; t AfTER STEP # 1.8 SUM=131 DELTA= DO PART 2 for R=l 1.2 EZ=-.5*CLN(2)+I/NU+NU*SUM)IBRACK=ISUM2= 1.4 DO PART 4 for R=2 1.5 EZ2=.5*(LNC2)t2)+LNC2)/NU+l/NUt2+NU*SUM2/2IVARZ=EZ2-EZt2 1.6 RATIOt=-VARZ/EZIRATI2=-EZ2/EZ 1.8 TYPE NUIEZ IEZ21VARZ1RATIOI1RATI2 IN form =1+1

70 C A=I/(R*(NU+R)*2tR)~SUM=SUM+A 2.3 R=R+l 2.4 TO STEP 2.5 IF A>DELTA 4.1 BRACK=BRACK+I/CR-l),CR=2*BRACK/R,B=CR/C2tR*(NU+R»,SUM2=SUM2+B 4.51 R=R+l 4.6 TO STEP 4.1 IF B>DELTA FORM 1 R A SUM FORM 2 #1 #.####N#t1t #.######1tt FORM 3 R BRACK CR B SUM2 FORM 4 ## H.#NHH##??t #.######1tt H.######1tt H.#HH###Ttt FORM 5 NU Eez> E<Zt2) VAReZ) RATIOt RATI2 form 6 HU##.U##H N#.#####,tt (f.#####t1t #.##H#Httt #.H#N##TTT H.H#HfI#ttt FORM 1 RNU R OIl' FORM 8 #. #H#### tt T #.######ttt #.######ttt

71 c-4 DO PART 1 ~IVE MU~ MUTAU (NAT.LOGS> & ADD TITLE OF DATA MU= MUTAU= J LORGE MAGAZINE COUNT DATA TYPE DISTRIBUTION~ MODEL I (CONSTANT DEPTH> NU EeZ> E(Zt2> MUTAU= SIGMA2= 11~8657 SIGMA= NU= EZ= VARZ= RATI1= DEPTH= VAR(Z> RATIOI RATI TOKEN DISTRIBUTION, MODEL I <CONSTANT DEPTH> NU EeZ> E<Zt2> MU= SIGMA2= SIGMA= 3~ NU= EZ= -1; VARZ= RATI1= DEPTH= VARCZ> RATIOI RATI TYPE DISTRIBUTION~ MODEL II <POISSON DEPTH) NU DOESN'T EXIST FOR THIS CASE TOKEN DISTRIBUTION~ MODEL II (POISSON DEPTH> NU E(Z> E(Zf2) MU= SIGMA2= SIGMA= NU= EZ= VARZ= RATI2= LAMBDA= VAR(Z> RATIOI RATI

72 APPENDIX D Table f Functins f v This table was prduced by the cmputer prgram in Appendix C. It gives, in flating-pint ntatin, values f E{z},E{z2} Var {z}, PI,and P2 ' as defined in the text by frmulas (7), (8), (9), (2), and (3), respectively.

73 D-2 TABLE OF FUNCTIONS OF NU ( V ) NU ~el Eez> ECZt2> VARCZ> RATIOI RATI , ,3 "; ": , ,3831, , , ": , ": ": ~ '-7928":

Bootstrap Method > # Purpose: understand how bootstrap method works > obs=c(11.96, 5.03, 67.40, 16.07, 31.50, 7.73, 11.10, 22.38) > n=length(obs) >

Bootstrap Method > # Purpose: understand how bootstrap method works > obs=c(11.96, 5.03, 67.40, 16.07, 31.50, 7.73, 11.10, 22.38) > n=length(obs) > Btstrap Methd > # Purpse: understand hw btstrap methd wrks > bs=c(11.96, 5.03, 67.40, 16.07, 31.50, 7.73, 11.10, 22.38) > n=length(bs) > mean(bs) [1] 21.64625 > # estimate f lambda > lambda = 1/mean(bs);

More information

, which yields. where z1. and z2

, which yields. where z1. and z2 The Gaussian r Nrmal PDF, Page 1 The Gaussian r Nrmal Prbability Density Functin Authr: Jhn M Cimbala, Penn State University Latest revisin: 11 September 13 The Gaussian r Nrmal Prbability Density Functin

More information

AP Statistics Notes Unit Two: The Normal Distributions

AP Statistics Notes Unit Two: The Normal Distributions AP Statistics Ntes Unit Tw: The Nrmal Distributins Syllabus Objectives: 1.5 The student will summarize distributins f data measuring the psitin using quartiles, percentiles, and standardized scres (z-scres).

More information

Math Foundations 20 Work Plan

Math Foundations 20 Work Plan Math Fundatins 20 Wrk Plan Units / Tpics 20.8 Demnstrate understanding f systems f linear inequalities in tw variables. Time Frame December 1-3 weeks 6-10 Majr Learning Indicatrs Identify situatins relevant

More information

CS 477/677 Analysis of Algorithms Fall 2007 Dr. George Bebis Course Project Due Date: 11/29/2007

CS 477/677 Analysis of Algorithms Fall 2007 Dr. George Bebis Course Project Due Date: 11/29/2007 CS 477/677 Analysis f Algrithms Fall 2007 Dr. Gerge Bebis Curse Prject Due Date: 11/29/2007 Part1: Cmparisn f Srting Algrithms (70% f the prject grade) The bjective f the first part f the assignment is

More information

Resampling Methods. Chapter 5. Chapter 5 1 / 52

Resampling Methods. Chapter 5. Chapter 5 1 / 52 Resampling Methds Chapter 5 Chapter 5 1 / 52 1 51 Validatin set apprach 2 52 Crss validatin 3 53 Btstrap Chapter 5 2 / 52 Abut Resampling An imprtant statistical tl Pretending the data as ppulatin and

More information

5 th grade Common Core Standards

5 th grade Common Core Standards 5 th grade Cmmn Cre Standards In Grade 5, instructinal time shuld fcus n three critical areas: (1) develping fluency with additin and subtractin f fractins, and develping understanding f the multiplicatin

More information

B. Definition of an exponential

B. Definition of an exponential Expnents and Lgarithms Chapter IV - Expnents and Lgarithms A. Intrductin Starting with additin and defining the ntatins fr subtractin, multiplicatin and divisin, we discvered negative numbers and fractins.

More information

A New Evaluation Measure. J. Joiner and L. Werner. The problems of evaluation and the needed criteria of evaluation

A New Evaluation Measure. J. Joiner and L. Werner. The problems of evaluation and the needed criteria of evaluation III-l III. A New Evaluatin Measure J. Jiner and L. Werner Abstract The prblems f evaluatin and the needed criteria f evaluatin measures in the SMART system f infrmatin retrieval are reviewed and discussed.

More information

On Huntsberger Type Shrinkage Estimator for the Mean of Normal Distribution ABSTRACT INTRODUCTION

On Huntsberger Type Shrinkage Estimator for the Mean of Normal Distribution ABSTRACT INTRODUCTION Malaysian Jurnal f Mathematical Sciences 4(): 7-4 () On Huntsberger Type Shrinkage Estimatr fr the Mean f Nrmal Distributin Department f Mathematical and Physical Sciences, University f Nizwa, Sultanate

More information

CHAPTER 4 DIAGNOSTICS FOR INFLUENTIAL OBSERVATIONS

CHAPTER 4 DIAGNOSTICS FOR INFLUENTIAL OBSERVATIONS CHAPTER 4 DIAGNOSTICS FOR INFLUENTIAL OBSERVATIONS 1 Influential bservatins are bservatins whse presence in the data can have a distrting effect n the parameter estimates and pssibly the entire analysis,

More information

SUPPLEMENTARY MATERIAL GaGa: a simple and flexible hierarchical model for microarray data analysis

SUPPLEMENTARY MATERIAL GaGa: a simple and flexible hierarchical model for microarray data analysis SUPPLEMENTARY MATERIAL GaGa: a simple and flexible hierarchical mdel fr micrarray data analysis David Rssell Department f Bistatistics M.D. Andersn Cancer Center, Hustn, TX 77030, USA rsselldavid@gmail.cm

More information

Lead/Lag Compensator Frequency Domain Properties and Design Methods

Lead/Lag Compensator Frequency Domain Properties and Design Methods Lectures 6 and 7 Lead/Lag Cmpensatr Frequency Dmain Prperties and Design Methds Definitin Cnsider the cmpensatr (ie cntrller Fr, it is called a lag cmpensatr s K Fr s, it is called a lead cmpensatr Ntatin

More information

MATHEMATICS SYLLABUS SECONDARY 5th YEAR

MATHEMATICS SYLLABUS SECONDARY 5th YEAR Eurpean Schls Office f the Secretary-General Pedaggical Develpment Unit Ref. : 011-01-D-8-en- Orig. : EN MATHEMATICS SYLLABUS SECONDARY 5th YEAR 6 perid/week curse APPROVED BY THE JOINT TEACHING COMMITTEE

More information

Differentiation Applications 1: Related Rates

Differentiation Applications 1: Related Rates Differentiatin Applicatins 1: Related Rates 151 Differentiatin Applicatins 1: Related Rates Mdel 1: Sliding Ladder 10 ladder y 10 ladder 10 ladder A 10 ft ladder is leaning against a wall when the bttm

More information

7 TH GRADE MATH STANDARDS

7 TH GRADE MATH STANDARDS ALGEBRA STANDARDS Gal 1: Students will use the language f algebra t explre, describe, represent, and analyze number expressins and relatins 7 TH GRADE MATH STANDARDS 7.M.1.1: (Cmprehensin) Select, use,

More information

OF SIMPLY SUPPORTED PLYWOOD PLATES UNDER COMBINED EDGEWISE BENDING AND COMPRESSION

OF SIMPLY SUPPORTED PLYWOOD PLATES UNDER COMBINED EDGEWISE BENDING AND COMPRESSION U. S. FOREST SERVICE RESEARCH PAPER FPL 50 DECEMBER U. S. DEPARTMENT OF AGRICULTURE FOREST SERVICE FOREST PRODUCTS LABORATORY OF SIMPLY SUPPORTED PLYWOOD PLATES UNDER COMBINED EDGEWISE BENDING AND COMPRESSION

More information

MODULE 1. e x + c. [You can t separate a demominator, but you can divide a single denominator into each numerator term] a + b a(a + b)+1 = a + b

MODULE 1. e x + c. [You can t separate a demominator, but you can divide a single denominator into each numerator term] a + b a(a + b)+1 = a + b . REVIEW OF SOME BASIC ALGEBRA MODULE () Slving Equatins Yu shuld be able t slve fr x: a + b = c a d + e x + c and get x = e(ba +) b(c a) d(ba +) c Cmmn mistakes and strategies:. a b + c a b + a c, but

More information

CAUSAL INFERENCE. Technical Track Session I. Phillippe Leite. The World Bank

CAUSAL INFERENCE. Technical Track Session I. Phillippe Leite. The World Bank CAUSAL INFERENCE Technical Track Sessin I Phillippe Leite The Wrld Bank These slides were develped by Christel Vermeersch and mdified by Phillippe Leite fr the purpse f this wrkshp Plicy questins are causal

More information

Introduction to Spacetime Geometry

Introduction to Spacetime Geometry Intrductin t Spacetime Gemetry Let s start with a review f a basic feature f Euclidean gemetry, the Pythagrean therem. In a twdimensinal crdinate system we can relate the length f a line segment t the

More information

making triangle (ie same reference angle) ). This is a standard form that will allow us all to have the X= y=

making triangle (ie same reference angle) ). This is a standard form that will allow us all to have the X= y= Intrductin t Vectrs I 21 Intrductin t Vectrs I 22 I. Determine the hrizntal and vertical cmpnents f the resultant vectr by cunting n the grid. X= y= J. Draw a mangle with hrizntal and vertical cmpnents

More information

CHAPTER 24: INFERENCE IN REGRESSION. Chapter 24: Make inferences about the population from which the sample data came.

CHAPTER 24: INFERENCE IN REGRESSION. Chapter 24: Make inferences about the population from which the sample data came. MATH 1342 Ch. 24 April 25 and 27, 2013 Page 1 f 5 CHAPTER 24: INFERENCE IN REGRESSION Chapters 4 and 5: Relatinships between tw quantitative variables. Be able t Make a graph (scatterplt) Summarize the

More information

MODULE FOUR. This module addresses functions. SC Academic Elementary Algebra Standards:

MODULE FOUR. This module addresses functions. SC Academic Elementary Algebra Standards: MODULE FOUR This mdule addresses functins SC Academic Standards: EA-3.1 Classify a relatinship as being either a functin r nt a functin when given data as a table, set f rdered pairs, r graph. EA-3.2 Use

More information

(2) Even if such a value of k was possible, the neutrons multiply

(2) Even if such a value of k was possible, the neutrons multiply CHANGE OF REACTOR Nuclear Thery - Curse 227 POWER WTH REACTVTY CHANGE n this lessn, we will cnsider hw neutrn density, neutrn flux and reactr pwer change when the multiplicatin factr, k, r the reactivity,

More information

CHAPTER 3 INEQUALITIES. Copyright -The Institute of Chartered Accountants of India

CHAPTER 3 INEQUALITIES. Copyright -The Institute of Chartered Accountants of India CHAPTER 3 INEQUALITIES Cpyright -The Institute f Chartered Accuntants f India INEQUALITIES LEARNING OBJECTIVES One f the widely used decisin making prblems, nwadays, is t decide n the ptimal mix f scarce

More information

Pattern Recognition 2014 Support Vector Machines

Pattern Recognition 2014 Support Vector Machines Pattern Recgnitin 2014 Supprt Vectr Machines Ad Feelders Universiteit Utrecht Ad Feelders ( Universiteit Utrecht ) Pattern Recgnitin 1 / 55 Overview 1 Separable Case 2 Kernel Functins 3 Allwing Errrs (Sft

More information

You need to be able to define the following terms and answer basic questions about them:

You need to be able to define the following terms and answer basic questions about them: CS440/ECE448 Sectin Q Fall 2017 Midterm Review Yu need t be able t define the fllwing terms and answer basic questins abut them: Intr t AI, agents and envirnments Pssible definitins f AI, prs and cns f

More information

A Matrix Representation of Panel Data

A Matrix Representation of Panel Data web Extensin 6 Appendix 6.A A Matrix Representatin f Panel Data Panel data mdels cme in tw brad varieties, distinct intercept DGPs and errr cmpnent DGPs. his appendix presents matrix algebra representatins

More information

initially lcated away frm the data set never win the cmpetitin, resulting in a nnptimal nal cdebk, [2] [3] [4] and [5]. Khnen's Self Organizing Featur

initially lcated away frm the data set never win the cmpetitin, resulting in a nnptimal nal cdebk, [2] [3] [4] and [5]. Khnen's Self Organizing Featur Cdewrd Distributin fr Frequency Sensitive Cmpetitive Learning with One Dimensinal Input Data Aristides S. Galanpuls and Stanley C. Ahalt Department f Electrical Engineering The Ohi State University Abstract

More information

Revision: August 19, E Main Suite D Pullman, WA (509) Voice and Fax

Revision: August 19, E Main Suite D Pullman, WA (509) Voice and Fax .7.4: Direct frequency dmain circuit analysis Revisin: August 9, 00 5 E Main Suite D Pullman, WA 9963 (509) 334 6306 ice and Fax Overview n chapter.7., we determined the steadystate respnse f electrical

More information

Perfrmance f Sensitizing Rules n Shewhart Cntrl Charts with Autcrrelated Data Key Wrds: Autregressive, Mving Average, Runs Tests, Shewhart Cntrl Chart

Perfrmance f Sensitizing Rules n Shewhart Cntrl Charts with Autcrrelated Data Key Wrds: Autregressive, Mving Average, Runs Tests, Shewhart Cntrl Chart Perfrmance f Sensitizing Rules n Shewhart Cntrl Charts with Autcrrelated Data Sandy D. Balkin Dennis K. J. Lin y Pennsylvania State University, University Park, PA 16802 Sandy Balkin is a graduate student

More information

SIZE BIAS IN LINE TRANSECT SAMPLING: A FIELD TEST. Mark C. Otto Statistics Research Division, Bureau of the Census Washington, D.C , U.S.A.

SIZE BIAS IN LINE TRANSECT SAMPLING: A FIELD TEST. Mark C. Otto Statistics Research Division, Bureau of the Census Washington, D.C , U.S.A. SIZE BIAS IN LINE TRANSECT SAMPLING: A FIELD TEST Mark C. Ott Statistics Research Divisin, Bureau f the Census Washingtn, D.C. 20233, U.S.A. and Kenneth H. Pllck Department f Statistics, Nrth Carlina State

More information

Chapter 3: Cluster Analysis

Chapter 3: Cluster Analysis Chapter 3: Cluster Analysis } 3.1 Basic Cncepts f Clustering 3.1.1 Cluster Analysis 3.1. Clustering Categries } 3. Partitining Methds 3..1 The principle 3.. K-Means Methd 3..3 K-Medids Methd 3..4 CLARA

More information

arxiv:hep-ph/ v1 2 Jun 1995

arxiv:hep-ph/ v1 2 Jun 1995 WIS-95//May-PH The rati F n /F p frm the analysis f data using a new scaling variable S. A. Gurvitz arxiv:hep-ph/95063v1 Jun 1995 Department f Particle Physics, Weizmann Institute f Science, Rehvt 76100,

More information

Determining the Accuracy of Modal Parameter Estimation Methods

Determining the Accuracy of Modal Parameter Estimation Methods Determining the Accuracy f Mdal Parameter Estimatin Methds by Michael Lee Ph.D., P.E. & Mar Richardsn Ph.D. Structural Measurement Systems Milpitas, CA Abstract The mst cmmn type f mdal testing system

More information

Kinetic Model Completeness

Kinetic Model Completeness 5.68J/10.652J Spring 2003 Lecture Ntes Tuesday April 15, 2003 Kinetic Mdel Cmpleteness We say a chemical kinetic mdel is cmplete fr a particular reactin cnditin when it cntains all the species and reactins

More information

CHM112 Lab Graphing with Excel Grading Rubric

CHM112 Lab Graphing with Excel Grading Rubric Name CHM112 Lab Graphing with Excel Grading Rubric Criteria Pints pssible Pints earned Graphs crrectly pltted and adhere t all guidelines (including descriptive title, prperly frmatted axes, trendline

More information

Experiment #3. Graphing with Excel

Experiment #3. Graphing with Excel Experiment #3. Graphing with Excel Study the "Graphing with Excel" instructins that have been prvided. Additinal help with learning t use Excel can be fund n several web sites, including http://www.ncsu.edu/labwrite/res/gt/gt-

More information

Emphases in Common Core Standards for Mathematical Content Kindergarten High School

Emphases in Common Core Standards for Mathematical Content Kindergarten High School Emphases in Cmmn Cre Standards fr Mathematical Cntent Kindergarten High Schl Cntent Emphases by Cluster March 12, 2012 Describes cntent emphases in the standards at the cluster level fr each grade. These

More information

UNIV1"'RSITY OF NORTH CAROLINA Department of Statistics Chapel Hill, N. C. CUMULATIVE SUM CONTROL CHARTS FOR THE FOLDED NORMAL DISTRIBUTION

UNIV1'RSITY OF NORTH CAROLINA Department of Statistics Chapel Hill, N. C. CUMULATIVE SUM CONTROL CHARTS FOR THE FOLDED NORMAL DISTRIBUTION UNIV1"'RSITY OF NORTH CAROLINA Department f Statistics Chapel Hill, N. C. CUMULATIVE SUM CONTROL CHARTS FOR THE FOLDED NORMAL DISTRIBUTION by N. L. Jlmsn December 1962 Grant N. AFOSR -62..148 Methds f

More information

Distributions, spatial statistics and a Bayesian perspective

Distributions, spatial statistics and a Bayesian perspective Distributins, spatial statistics and a Bayesian perspective Dug Nychka Natinal Center fr Atmspheric Research Distributins and densities Cnditinal distributins and Bayes Thm Bivariate nrmal Spatial statistics

More information

A mathematical model for complete stress-strain curve prediction of permeable concrete

A mathematical model for complete stress-strain curve prediction of permeable concrete A mathematical mdel fr cmplete stress-strain curve predictin f permeable cncrete M. K. Hussin Y. Zhuge F. Bullen W. P. Lkuge Faculty f Engineering and Surveying, University f Suthern Queensland, Twmba,

More information

Computational modeling techniques

Computational modeling techniques Cmputatinal mdeling techniques Lecture 2: Mdeling change. In Petre Department f IT, Åb Akademi http://users.ab.fi/ipetre/cmpmd/ Cntent f the lecture Basic paradigm f mdeling change Examples Linear dynamical

More information

The blessing of dimensionality for kernel methods

The blessing of dimensionality for kernel methods fr kernel methds Building classifiers in high dimensinal space Pierre Dupnt Pierre.Dupnt@ucluvain.be Classifiers define decisin surfaces in sme feature space where the data is either initially represented

More information

Internal vs. external validity. External validity. This section is based on Stock and Watson s Chapter 9.

Internal vs. external validity. External validity. This section is based on Stock and Watson s Chapter 9. Sectin 7 Mdel Assessment This sectin is based n Stck and Watsn s Chapter 9. Internal vs. external validity Internal validity refers t whether the analysis is valid fr the ppulatin and sample being studied.

More information

Comparing Several Means: ANOVA. Group Means and Grand Mean

Comparing Several Means: ANOVA. Group Means and Grand Mean STAT 511 ANOVA and Regressin 1 Cmparing Several Means: ANOVA Slide 1 Blue Lake snap beans were grwn in 12 pen-tp chambers which are subject t 4 treatments 3 each with O 3 and SO 2 present/absent. The ttal

More information

February 28, 2013 COMMENTS ON DIFFUSION, DIFFUSIVITY AND DERIVATION OF HYPERBOLIC EQUATIONS DESCRIBING THE DIFFUSION PHENOMENA

February 28, 2013 COMMENTS ON DIFFUSION, DIFFUSIVITY AND DERIVATION OF HYPERBOLIC EQUATIONS DESCRIBING THE DIFFUSION PHENOMENA February 28, 2013 COMMENTS ON DIFFUSION, DIFFUSIVITY AND DERIVATION OF HYPERBOLIC EQUATIONS DESCRIBING THE DIFFUSION PHENOMENA Mental Experiment regarding 1D randm walk Cnsider a cntainer f gas in thermal

More information

Modelling of Clock Behaviour. Don Percival. Applied Physics Laboratory University of Washington Seattle, Washington, USA

Modelling of Clock Behaviour. Don Percival. Applied Physics Laboratory University of Washington Seattle, Washington, USA Mdelling f Clck Behaviur Dn Percival Applied Physics Labratry University f Washingtn Seattle, Washingtn, USA verheads and paper fr talk available at http://faculty.washingtn.edu/dbp/talks.html 1 Overview

More information

Lecture 17: Free Energy of Multi-phase Solutions at Equilibrium

Lecture 17: Free Energy of Multi-phase Solutions at Equilibrium Lecture 17: 11.07.05 Free Energy f Multi-phase Slutins at Equilibrium Tday: LAST TIME...2 FREE ENERGY DIAGRAMS OF MULTI-PHASE SOLUTIONS 1...3 The cmmn tangent cnstructin and the lever rule...3 Practical

More information

MATCHING TECHNIQUES. Technical Track Session VI. Emanuela Galasso. The World Bank

MATCHING TECHNIQUES. Technical Track Session VI. Emanuela Galasso. The World Bank MATCHING TECHNIQUES Technical Track Sessin VI Emanuela Galass The Wrld Bank These slides were develped by Christel Vermeersch and mdified by Emanuela Galass fr the purpse f this wrkshp When can we use

More information

Thermodynamics and Equilibrium

Thermodynamics and Equilibrium Thermdynamics and Equilibrium Thermdynamics Thermdynamics is the study f the relatinship between heat and ther frms f energy in a chemical r physical prcess. We intrduced the thermdynamic prperty f enthalpy,

More information

The standards are taught in the following sequence.

The standards are taught in the following sequence. B L U E V A L L E Y D I S T R I C T C U R R I C U L U M MATHEMATICS Third Grade In grade 3, instructinal time shuld fcus n fur critical areas: (1) develping understanding f multiplicatin and divisin and

More information

Module 4: General Formulation of Electric Circuit Theory

Module 4: General Formulation of Electric Circuit Theory Mdule 4: General Frmulatin f Electric Circuit Thery 4. General Frmulatin f Electric Circuit Thery All electrmagnetic phenmena are described at a fundamental level by Maxwell's equatins and the assciated

More information

A Few Basic Facts About Isothermal Mass Transfer in a Binary Mixture

A Few Basic Facts About Isothermal Mass Transfer in a Binary Mixture Few asic Facts but Isthermal Mass Transfer in a inary Miture David Keffer Department f Chemical Engineering University f Tennessee first begun: pril 22, 2004 last updated: January 13, 2006 dkeffer@utk.edu

More information

Eric Klein and Ning Sa

Eric Klein and Ning Sa Week 12. Statistical Appraches t Netwrks: p1 and p* Wasserman and Faust Chapter 15: Statistical Analysis f Single Relatinal Netwrks There are fur tasks in psitinal analysis: 1) Define Equivalence 2) Measure

More information

COMP 551 Applied Machine Learning Lecture 5: Generative models for linear classification

COMP 551 Applied Machine Learning Lecture 5: Generative models for linear classification COMP 551 Applied Machine Learning Lecture 5: Generative mdels fr linear classificatin Instructr: Herke van Hf (herke.vanhf@mail.mcgill.ca) Slides mstly by: Jelle Pineau Class web page: www.cs.mcgill.ca/~hvanh2/cmp551

More information

NUMBERS, MATHEMATICS AND EQUATIONS

NUMBERS, MATHEMATICS AND EQUATIONS AUSTRALIAN CURRICULUM PHYSICS GETTING STARTED WITH PHYSICS NUMBERS, MATHEMATICS AND EQUATIONS An integral part t the understanding f ur physical wrld is the use f mathematical mdels which can be used t

More information

COASTAL ENGINEERING Chapter 2

COASTAL ENGINEERING Chapter 2 CASTAL ENGINEERING Chapter 2 GENERALIZED WAVE DIFFRACTIN DIAGRAMS J. W. Jhnsn Assciate Prfessr f Mechanical Engineering University f Califrnia Berkeley, Califrnia INTRDUCTIN Wave diffractin is the phenmenn

More information

Resampling Methods. Cross-validation, Bootstrapping. Marek Petrik 2/21/2017

Resampling Methods. Cross-validation, Bootstrapping. Marek Petrik 2/21/2017 Resampling Methds Crss-validatin, Btstrapping Marek Petrik 2/21/2017 Sme f the figures in this presentatin are taken frm An Intrductin t Statistical Learning, with applicatins in R (Springer, 2013) with

More information

Math Foundations 10 Work Plan

Math Foundations 10 Work Plan Math Fundatins 10 Wrk Plan Units / Tpics 10.1 Demnstrate understanding f factrs f whle numbers by: Prime factrs Greatest Cmmn Factrs (GCF) Least Cmmn Multiple (LCM) Principal square rt Cube rt Time Frame

More information

Building to Transformations on Coordinate Axis Grade 5: Geometry Graph points on the coordinate plane to solve real-world and mathematical problems.

Building to Transformations on Coordinate Axis Grade 5: Geometry Graph points on the coordinate plane to solve real-world and mathematical problems. Building t Transfrmatins n Crdinate Axis Grade 5: Gemetry Graph pints n the crdinate plane t slve real-wrld and mathematical prblems. 5.G.1. Use a pair f perpendicular number lines, called axes, t define

More information

Interference is when two (or more) sets of waves meet and combine to produce a new pattern.

Interference is when two (or more) sets of waves meet and combine to produce a new pattern. Interference Interference is when tw (r mre) sets f waves meet and cmbine t prduce a new pattern. This pattern can vary depending n the riginal wave directin, wavelength, amplitude, etc. The tw mst extreme

More information

Hubble s Law PHYS 1301

Hubble s Law PHYS 1301 1 PHYS 1301 Hubble s Law Why: The lab will verify Hubble s law fr the expansin f the universe which is ne f the imprtant cnsequences f general relativity. What: Frm measurements f the angular size and

More information

How do scientists measure trees? What is DBH?

How do scientists measure trees? What is DBH? Hw d scientists measure trees? What is DBH? Purpse Students develp an understanding f tree size and hw scientists measure trees. Students bserve and measure tree ckies and explre the relatinship between

More information

Admissibility Conditions and Asymptotic Behavior of Strongly Regular Graphs

Admissibility Conditions and Asymptotic Behavior of Strongly Regular Graphs Admissibility Cnditins and Asympttic Behavir f Strngly Regular Graphs VASCO MOÇO MANO Department f Mathematics University f Prt Oprt PORTUGAL vascmcman@gmailcm LUÍS ANTÓNIO DE ALMEIDA VIEIRA Department

More information

COMP 551 Applied Machine Learning Lecture 11: Support Vector Machines

COMP 551 Applied Machine Learning Lecture 11: Support Vector Machines COMP 551 Applied Machine Learning Lecture 11: Supprt Vectr Machines Instructr: (jpineau@cs.mcgill.ca) Class web page: www.cs.mcgill.ca/~jpineau/cmp551 Unless therwise nted, all material psted fr this curse

More information

Lab 1 The Scientific Method

Lab 1 The Scientific Method INTRODUCTION The fllwing labratry exercise is designed t give yu, the student, an pprtunity t explre unknwn systems, r universes, and hypthesize pssible rules which may gvern the behavir within them. Scientific

More information

Pipetting 101 Developed by BSU CityLab

Pipetting 101 Developed by BSU CityLab Discver the Micrbes Within: The Wlbachia Prject Pipetting 101 Develped by BSU CityLab Clr Cmparisns Pipetting Exercise #1 STUDENT OBJECTIVES Students will be able t: Chse the crrect size micrpipette fr

More information

A Correlation of. to the. South Carolina Academic Standards for Mathematics Precalculus

A Correlation of. to the. South Carolina Academic Standards for Mathematics Precalculus A Crrelatin f Suth Carlina Academic Standards fr Mathematics Precalculus INTRODUCTION This dcument demnstrates hw Precalculus (Blitzer), 4 th Editin 010, meets the indicatrs f the. Crrelatin page references

More information

Study Group Report: Plate-fin Heat Exchangers: AEA Technology

Study Group Report: Plate-fin Heat Exchangers: AEA Technology Study Grup Reprt: Plate-fin Heat Exchangers: AEA Technlgy The prblem under study cncerned the apparent discrepancy between a series f experiments using a plate fin heat exchanger and the classical thery

More information

Physics 2010 Motion with Constant Acceleration Experiment 1

Physics 2010 Motion with Constant Acceleration Experiment 1 . Physics 00 Mtin with Cnstant Acceleratin Experiment In this lab, we will study the mtin f a glider as it accelerates dwnhill n a tilted air track. The glider is supprted ver the air track by a cushin

More information

ABSORPTION OF GAMMA RAYS

ABSORPTION OF GAMMA RAYS 6 Sep 11 Gamma.1 ABSORPTIO OF GAMMA RAYS Gamma rays is the name given t high energy electrmagnetic radiatin riginating frm nuclear energy level transitins. (Typical wavelength, frequency, and energy ranges

More information

EASTERN ARIZONA COLLEGE Introduction to Statistics

EASTERN ARIZONA COLLEGE Introduction to Statistics EASTERN ARIZONA COLLEGE Intrductin t Statistics Curse Design 2014-2015 Curse Infrmatin Divisin Scial Sciences Curse Number PSY 220 Title Intrductin t Statistics Credits 3 Develped by Adam Stinchcmbe Lecture/Lab

More information

Verification of Quality Parameters of a Solar Panel and Modification in Formulae of its Series Resistance

Verification of Quality Parameters of a Solar Panel and Modification in Formulae of its Series Resistance Verificatin f Quality Parameters f a Slar Panel and Mdificatin in Frmulae f its Series Resistance Sanika Gawhane Pune-411037-India Onkar Hule Pune-411037- India Chinmy Kulkarni Pune-411037-India Ojas Pandav

More information

Hypothesis Tests for One Population Mean

Hypothesis Tests for One Population Mean Hypthesis Tests fr One Ppulatin Mean Chapter 9 Ala Abdelbaki Objective Objective: T estimate the value f ne ppulatin mean Inferential statistics using statistics in rder t estimate parameters We will be

More information

Flipping Physics Lecture Notes: Simple Harmonic Motion Introduction via a Horizontal Mass-Spring System

Flipping Physics Lecture Notes: Simple Harmonic Motion Introduction via a Horizontal Mass-Spring System Flipping Physics Lecture Ntes: Simple Harmnic Mtin Intrductin via a Hrizntal Mass-Spring System A Hrizntal Mass-Spring System is where a mass is attached t a spring, riented hrizntally, and then placed

More information

Tree Structured Classifier

Tree Structured Classifier Tree Structured Classifier Reference: Classificatin and Regressin Trees by L. Breiman, J. H. Friedman, R. A. Olshen, and C. J. Stne, Chapman & Hall, 98. A Medical Eample (CART): Predict high risk patients

More information

Mathematics Methods Units 1 and 2

Mathematics Methods Units 1 and 2 Mathematics Methds Units 1 and 2 Mathematics Methds is an ATAR curse which fcuses n the use f calculus and statistical analysis. The study f calculus prvides a basis fr understanding rates f change in

More information

Flipping Physics Lecture Notes: Simple Harmonic Motion Introduction via a Horizontal Mass-Spring System

Flipping Physics Lecture Notes: Simple Harmonic Motion Introduction via a Horizontal Mass-Spring System Flipping Physics Lecture Ntes: Simple Harmnic Mtin Intrductin via a Hrizntal Mass-Spring System A Hrizntal Mass-Spring System is where a mass is attached t a spring, riented hrizntally, and then placed

More information

1996 Engineering Systems Design and Analysis Conference, Montpellier, France, July 1-4, 1996, Vol. 7, pp

1996 Engineering Systems Design and Analysis Conference, Montpellier, France, July 1-4, 1996, Vol. 7, pp THE POWER AND LIMIT OF NEURAL NETWORKS T. Y. Lin Department f Mathematics and Cmputer Science San Jse State University San Jse, Califrnia 959-003 tylin@cs.ssu.edu and Bereley Initiative in Sft Cmputing*

More information

Pressure And Entropy Variations Across The Weak Shock Wave Due To Viscosity Effects

Pressure And Entropy Variations Across The Weak Shock Wave Due To Viscosity Effects Pressure And Entrpy Variatins Acrss The Weak Shck Wave Due T Viscsity Effects OSTAFA A. A. AHOUD Department f athematics Faculty f Science Benha University 13518 Benha EGYPT Abstract:-The nnlinear differential

More information

22.54 Neutron Interactions and Applications (Spring 2004) Chapter 11 (3/11/04) Neutron Diffusion

22.54 Neutron Interactions and Applications (Spring 2004) Chapter 11 (3/11/04) Neutron Diffusion .54 Neutrn Interactins and Applicatins (Spring 004) Chapter (3//04) Neutrn Diffusin References -- J. R. Lamarsh, Intrductin t Nuclear Reactr Thery (Addisn-Wesley, Reading, 966) T study neutrn diffusin

More information

Sections 15.1 to 15.12, 16.1 and 16.2 of the textbook (Robbins-Miller) cover the materials required for this topic.

Sections 15.1 to 15.12, 16.1 and 16.2 of the textbook (Robbins-Miller) cover the materials required for this topic. Tpic : AC Fundamentals, Sinusidal Wavefrm, and Phasrs Sectins 5. t 5., 6. and 6. f the textbk (Rbbins-Miller) cver the materials required fr this tpic.. Wavefrms in electrical systems are current r vltage

More information

Document for ENES5 meeting

Document for ENES5 meeting HARMONISATION OF EXPOSURE SCENARIO SHORT TITLES Dcument fr ENES5 meeting Paper jintly prepared by ECHA Cefic DUCC ESCOM ES Shrt Titles Grup 13 Nvember 2013 OBJECTIVES FOR ENES5 The bjective f this dcument

More information

Homology groups of disks with holes

Homology groups of disks with holes Hmlgy grups f disks with hles THEOREM. Let p 1,, p k } be a sequence f distinct pints in the interir unit disk D n where n 2, and suppse that fr all j the sets E j Int D n are clsed, pairwise disjint subdisks.

More information

An Introduction to Complex Numbers - A Complex Solution to a Simple Problem ( If i didn t exist, it would be necessary invent me.

An Introduction to Complex Numbers - A Complex Solution to a Simple Problem ( If i didn t exist, it would be necessary invent me. An Intrductin t Cmple Numbers - A Cmple Slutin t a Simple Prblem ( If i didn t eist, it wuld be necessary invent me. ) Our Prblem. The rules fr multiplying real numbers tell us that the prduct f tw negative

More information

ENSC Discrete Time Systems. Project Outline. Semester

ENSC Discrete Time Systems. Project Outline. Semester ENSC 49 - iscrete Time Systems Prject Outline Semester 006-1. Objectives The gal f the prject is t design a channel fading simulatr. Upn successful cmpletin f the prject, yu will reinfrce yur understanding

More information

Department of Economics, University of California, Davis Ecn 200C Micro Theory Professor Giacomo Bonanno. Insurance Markets

Department of Economics, University of California, Davis Ecn 200C Micro Theory Professor Giacomo Bonanno. Insurance Markets Department f Ecnmics, University f alifrnia, Davis Ecn 200 Micr Thery Prfessr Giacm Bnann Insurance Markets nsider an individual wh has an initial wealth f. ith sme prbability p he faces a lss f x (0

More information

Five Whys How To Do It Better

Five Whys How To Do It Better Five Whys Definitin. As explained in the previus article, we define rt cause as simply the uncvering f hw the current prblem came int being. Fr a simple causal chain, it is the entire chain. Fr a cmplex

More information

BASD HIGH SCHOOL FORMAL LAB REPORT

BASD HIGH SCHOOL FORMAL LAB REPORT BASD HIGH SCHOOL FORMAL LAB REPORT *WARNING: After an explanatin f what t include in each sectin, there is an example f hw the sectin might lk using a sample experiment Keep in mind, the sample lab used

More information

BOUNDED UNCERTAINTY AND CLIMATE CHANGE ECONOMICS. Christopher Costello, Andrew Solow, Michael Neubert, and Stephen Polasky

BOUNDED UNCERTAINTY AND CLIMATE CHANGE ECONOMICS. Christopher Costello, Andrew Solow, Michael Neubert, and Stephen Polasky BOUNDED UNCERTAINTY AND CLIMATE CHANGE ECONOMICS Christpher Cstell, Andrew Slw, Michael Neubert, and Stephen Plasky Intrductin The central questin in the ecnmic analysis f climate change plicy cncerns

More information

Preparation work for A2 Mathematics [2017]

Preparation work for A2 Mathematics [2017] Preparatin wrk fr A2 Mathematics [2017] The wrk studied in Y12 after the return frm study leave is frm the Cre 3 mdule f the A2 Mathematics curse. This wrk will nly be reviewed during Year 13, it will

More information

Comprehensive Exam Guidelines Department of Chemical and Biomolecular Engineering, Ohio University

Comprehensive Exam Guidelines Department of Chemical and Biomolecular Engineering, Ohio University Cmprehensive Exam Guidelines Department f Chemical and Bimlecular Engineering, Ohi University Purpse In the Cmprehensive Exam, the student prepares an ral and a written research prpsal. The Cmprehensive

More information

(1.1) V which contains charges. If a charge density ρ, is defined as the limit of the ratio of the charge contained. 0, and if a force density f

(1.1) V which contains charges. If a charge density ρ, is defined as the limit of the ratio of the charge contained. 0, and if a force density f 1.0 Review f Electrmagnetic Field Thery Selected aspects f electrmagnetic thery are reviewed in this sectin, with emphasis n cncepts which are useful in understanding magnet design. Detailed, rigrus treatments

More information

Least Squares Optimal Filtering with Multirate Observations

Least Squares Optimal Filtering with Multirate Observations Prc. 36th Asilmar Cnf. n Signals, Systems, and Cmputers, Pacific Grve, CA, Nvember 2002 Least Squares Optimal Filtering with Multirate Observatins Charles W. herrien and Anthny H. Hawes Department f Electrical

More information

Determining Optimum Path in Synthesis of Organic Compounds using Branch and Bound Algorithm

Determining Optimum Path in Synthesis of Organic Compounds using Branch and Bound Algorithm Determining Optimum Path in Synthesis f Organic Cmpunds using Branch and Bund Algrithm Diastuti Utami 13514071 Prgram Studi Teknik Infrmatika Seklah Teknik Elektr dan Infrmatika Institut Teknlgi Bandung,

More information

LHS Mathematics Department Honors Pre-Calculus Final Exam 2002 Answers

LHS Mathematics Department Honors Pre-Calculus Final Exam 2002 Answers LHS Mathematics Department Hnrs Pre-alculus Final Eam nswers Part Shrt Prblems The table at the right gives the ppulatin f Massachusetts ver the past several decades Using an epnential mdel, predict the

More information

AP Statistics Notes Unit Five: Randomness and Probability

AP Statistics Notes Unit Five: Randomness and Probability AP Statistics Ntes Unit Five: Randmness and Prbability Syllabus Objectives: 3.1 The student will interpret prbability, including the lng-term relative frequency distributin. 3.2 The student will discuss

More information

Methods for Determination of Mean Speckle Size in Simulated Speckle Pattern

Methods for Determination of Mean Speckle Size in Simulated Speckle Pattern 0.478/msr-04-004 MEASUREMENT SCENCE REVEW, Vlume 4, N. 3, 04 Methds fr Determinatin f Mean Speckle Size in Simulated Speckle Pattern. Hamarvá, P. Šmíd, P. Hrváth, M. Hrabvský nstitute f Physics f the Academy

More information

AIP Logic Chapter 4 Notes

AIP Logic Chapter 4 Notes AIP Lgic Chapter 4 Ntes Sectin 4.1 Sectin 4.2 Sectin 4.3 Sectin 4.4 Sectin 4.5 Sectin 4.6 Sectin 4.7 4.1 The Cmpnents f Categrical Prpsitins There are fur types f categrical prpsitins. Prpsitin Letter

More information