An Index for SSRN Downloads

Size: px
Start display at page:

Download "An Index for SSRN Downloads"

Transcription

1 An Index for SSRN Downloads Zura Kakushadze 1 Quantigic Solutions LLC, High Ridge Road, #135, Stamford, CT Free University of Tbilisi, Business School & School of Physics 240, David Agmashenebeli Alley, Tbilisi, 0159, Georgia September 5, 2015; revised November 13, 2015 To my mother Ludmila (Mila) Kakushadze on the occasion of her upcoming birthday Abstract We propose a new index to quantify SSRN downloads. Unlike the SSRN downloads rank, which is based on the total number of an author s SSRN downloads, our index also reflects the author s productivity by taking into account the download numbers for the papers. Our index is inspired by but is not the same as Hirsch s h-index for citations, which cannot be directly applied to SSRN downloads. We analyze data for about 30,000 authors and 367,000 papers. We find a simple empirical formula for the SSRN author rank via a Gaussian function of the log of the number of downloads. 1 Zura Kakushadze, Ph.D., is the President and a Co-Founder of Quantigic Solutions LLC and a Full Professor in the Business School and the School of Physics at Free University of Tbilisi. zura@quantigic.com 2 DISCLAIMER: This address is used by the corresponding author for no purpose other than to indicate his professional affiliation as is customary in publications. In particular, the contents of this paper are not intended as an investment, legal, tax or any other such advice, and in no way represent views of Quantigic Solutions LLC, the website or any of their other affiliates. 1

2 1. Introduction In many scientific disciplines e.g., physics the total number of a researcher s citations is considered an important metric of the researcher s scientific impact. However, it does not take into account the author s productivity. An author may write just a single paper garnering many citations. Complementarily, the number of publications is also an important metric. However, it does not account for how important any of the researcher s (possibly numerous) papers are. Hirsch (2005) proposed an index the h-index 3 whose purpose is to combine into a single number both the citations and publications figures. The appeal of the h-index is that, not only is it intuitive and simple to compute, it requires only data for a given author (publications and citations), no cross-sectional (across a sample of authors) data. E.g., INSPIRE High-Energy Physics Literature Database, Thomson Reuters Web of Science and Google Scholar all have utilized the h-index. Also, its variation has been applied to the internet media (Hovden, 2013). Social Science Research Network (SSRN) keeps track of numbers of downloads for each author and paper. The number of SSRN downloads has perhaps not surprisingly in hindsight become an important metric in its own right. SSRN ranks authors and papers by the number of downloads. However, just as with citations, the number of downloads for a given author does not take into account the author s productivity, that is, the number of publications (or papers). In this note we propose a new index for SSRN downloads. Unlike the SSRN downloads rank, which is based on the total number of an author s SSRN downloads, our index also reflects the author s productivity by taking into account the download numbers for the papers. Our index is inspired by but is not the same as the h-index. Thus, if we apply the h-index to SSRN downloads (via simply replacing citations by downloads), we will find that the h-index is mostly equal the number of papers: the bulk of the numbers of downloads exceeds the bulk of the numbers of citations by roughly a few orders of magnitude, so the h-index is uninformative. 4 We circumvent this difficulty by noting that the numbers of downloads in fact, just as the numbers of citations have quasi-log-normal distributions. I.e., the numbers of downloads (citations) are exponential by nature. 3 A scientist has index h if h of his/her papers have at least h citations each, and the other ( h) papers have no more than h citations each (Hirsch, 2005). This is the same as the Eddington number in cycling, with papers replaced by days and citations replaced by miles cycled on a given day. Its inventor, Sir Arthur Stanley Eddington ( ), was a renowned British astronomer, physicist and mathematician and an avid cyclist. 4 The h-index for citations turns out to be informative due to a numerological accident that the number of papers is typically bounded by (low) 2 digits, and the number of citations is bounded by roughly a square of that. The same does not hold for downloads, which roughly are a few orders of magnitude more ubiquitous. We could rescale the numbers of downloads by an overall factor based on cross-sectional data thereby forgoing simplicity. 2

3 We therefore define the following index call it for the lack of a better name for SSRN downloads for a given author: =maxmin(, ) (1) = ( ) (2) Where is the number of downloads for the -th paper by said author; =1,, ; is the number of papers with >0; and are ordered decreasingly:. As we discuss in more detail below, the index appears to produce reasonable statistical output. Let us recast (1) and (2) in plain English. For a given author, take all papers with nonzero downloads. Sort these papers decreasingly with the number of downloads. For each paper, take the integer part (the floor) of the natural log of the number of downloads, which is. Let us call log-downloads. Then the author s index equals if of the papers have at least log-downloads each, and the other ( ) papers have no more than log-downloads each. This is the same as the h-index with the numbers of citations replaced by log-downloads. The index is integer-valued. Currently, it varies from 0 to 9 (we discuss our data in detail in the next section). So, many authors share the same. We can add granularity via noninteger indexes and (cf. (Ruane and Tol, 2008)): =ln( )= ( +1)ln( ) 1+ln( ) (3) Here =ln( ) if < ; otherwise, =. By definition, < +1 and =. Eq. (3) has a simple geometric meaning. First, consider the case <. Once we fix via Eqs. (1) and (2), we linearly interpolate between the points (, ln( )) and ( +1, ln( )) (cf. (Rousseau, 2006, 2014)). On that line there always exists a point (, ), which defines. For = we interpolate twixt (, ln( )) and ( +1, ); being a natural estimate for ln( ). We compute, and for about 30,000 SSRN authors (see Section 2). Table 1 gives the top 20 authors by their values and shows that the ranking of authors by does not coincide with their SSRN rank (similarly to ranking by the h-index v. ranking by citations). Figure 1 illustrates the computation of the, and indexes. Both the and indexes are equally informative and as a matter of preference we can use either or interchangeably. In Section 2 we discuss our data and its statistical characteristics for about 30,000 SSRN authors and over 367,000 papers. We find a simple empirical formula for the SSRN author rank (see Section 2 for the empirical values of the numeric coefficients,, ): ln( ) + ln( )+ ln ( ) (4) 3

4 where is the author s total number of downloads. In section 3 we discuss some statistical properties of our indexes, and, including what is an analog of the ratio h / for the h- index. In Subsection 3.1 we discuss another index for SSRN downloads inspired by the -index (Egghe, 2006) and how it compares to the index. We briefly conclude in Section 4, where we discuss advantages of our proposal, some caveats and (in some cases) how to cope with them. As mentioned above, our indexes for SSRN downloads are related to the h-index (and, thereby, the Eddington number) by virtue of their definitions. In this regard, let us mention some prior works. Here we will not attempt a comprehensive overview of the literature on the h-index and various related indexes for detailed reviews and extensive lists of references, see, e.g., (Alonso et al, 2009), (Egge, 2010) and (Norris and Oppenheim, 2010). Instead, here we focus on prior works with some potential relevance to the approach we follow in this paper. Thus, various nonlinear generalizations of the h-index have been proposed, e.g., the h(2) index (Kosmulski, 2006) and its generalizations (Levitt and Thelwall, 2007; Deineko and Woeginger, 2009). The h(2) index was applied to article downloads as a metric for academic journals (Hua et al, 2009). Logarithms of citations have been considered in other contexts such as ranking (see, e.g., (Lundberg, 2007) and (Stringer, Sales-Pardo and Amaral, 2008)). However, to our knowledge, our log-based indexes, which stem from our observation of the exponential nature of SSRN downloads (and other metrics, including citations see Subsection 3.5) are the first of their kind. Also, our application of indexes of this kind to SSRN downloads is novel and the beauty of working with SSRN downloads data is that it is large and provides lots of statistics. Currently, SSRN uses vanilla downloads (and, secondarily, citations) to rank authors and papers. Variations on the h-index theme include the aforementioned -index of (Egge, 2006) (which is analogous to the h(2) index with citations replaced by cumulative citations), the h -index (Alonso et al, 2010), the h -index (Van Eck and Waltman, 2008), the -index (Jin, 2006; Rousseau, 2006) and its variation the -index (Bornmann, Mutz and Daniel, 2008), the -index and the -index (Jin et al, 2007) (also see (Jarvelin and Persson 2008)), the citation-weighted h-index (Egge and Rousseau, 2008), the contemporary h-index, the trend h-index and the normalized h-index (Sidiropoulos, Katsaros and Manolopoulos, 2007), the dynamic h-type index (Rousseau and Ye, 2008), the tapered h-index (Anderson, Hankin and Killworth, 2008), variants accounting for multiple authors (Shreiber, 2008; Batista et al, 2006; Bornmann and Daniel, 2007; Imperial and Rodriguez-Navarro, 2007; Egge, 2008), and other variations of the h-index. 2. Data In this section we describe our dataset. We downloaded all of our data directly from the SSRN website. The download was automated excepting some manual patches (see below). 4

5 SSRN provides the Top Authors data for the top 30,000 authors 5 based on downloads, both overall and for the last 12 months. We downloaded this data on 08/11/ The data contains links to the authors freely accessible individual webpages with their scholarly papers. Out of the 30,000 webpages 15 turned out to be bad, so our dataset contains 29,985 authors. The Top Authors data consists of 300 webpages, with 100 authors per page. Among other data, for each author these webpages contain the total number of downloads and the total number of papers (overall and for the last 12 months). Table 2 gives summaries for and as well as downloads-per-paper / and their logarithms. Figure 2 plots densities and histograms for ln( ) (overall and 12 mo). The distribution for overall ln( ) is quasi-normal, so the distribution for overall is quasi-log-normal, i.e., it is skewed with a long tail at the higher end. The distribution for 12-mo is even more skewed. It is evident that we should not work with the numbers of downloads but their logs. This conclusion is further supported by the densities (and histograms) for downloads-per-paper in Figure 3. Furthermore, the density for overall ln( / ) in Figure 3 is very close to a Gaussian. In Figure 4 we plot the same density together with a Gaussian curve from a least-squares fit. 7 Here we should remark that the Top Authors data includes all papers in and thereby in the / computation, even those that SSRN does not include in the computation of (such as an author s so-called other papers ). 8 Furthermore, there are regular papers with no downloads for good reasons, e.g., papers with abstracts only and without downloadable PDF files. In many cases this is due to the policies of the journals where such papers are published, which do not permit posting published papers on the internet, including SSRN. Keeping such papers in in the / computation artificially lowers the downloads-per-paper figures. However, these nuances do not affect our conclusion relating to quasi-log-normality. 5 SSRN provides the Top Authors data for all authors and also separately for the law, business and economics authors, with such data for the accounting and finance authors apparently forthcoming. We analyzed the data for all authors. It would be interesting to repeat our analysis for the above 5 disciplines (when they become available). 6 Accessing this data beyond the top 10 authors requires an SSRN account login. The downloaded webpages state that the data was last updated on 07/27/ We use the R function optim() to determine the three parameters in the fit (mean, standard deviation and maximum value); see Figure 4. 8 E.g., in the Top Authors data, P. Fernandez s (SSRN ID 12696) is based on 230 of his papers. His 3 other papers (in the SSRN terminology) do not contribute to the total number of downloads; however, in the Top Authors data =233 and this is the number used in computing / = / All figures are as of the date we downloaded the data (see above). Using =230 would appear to be better. 5

6 2.1. SSRN Rank v. Downloads The density plots in Figures 2 and 3 are rather convincing: the numbers of downloads are exponential by nature. Let be the number of authors in our data. Then the SSRN author rank is given by (where the rank is computed across all authors in the Top Authors data) = +1 rank( ) (5) We plot v. ln( ) and ln( ) v. ln( ) in Figure 5 (overall and 12 mo). Let us start with the overall downloads. Except for the top 3 outliers (M.C. Jensen, P. Fernandez and E.F. Fama; see Table 1), the lower-left curve in Figure 5 is almost parabolic. A quadratic curve fits the data very well indeed. The results for the fit using a polynomial regression are given in Table 3. So, we have the empirical formula (4) with the numeric coefficients 4.704, 2.009, Adding a cubic term does not improve the fit. The inflection point in the upper-left curve in Figure 5 occurs around The results for the quadratic fit for the 12-mo downloads is summarized in Table 4. The empirical formula (4) also holds in this case with the numeric coefficients 11.18, 0.492, A cubic term does not improve the fit. The inflection point in the upper-right curve in Figure 5 occurs around Top Papers Data SSRN provides the Top Papers data for the top 10,000 papers 10 based on downloads, both overall and for the last 12 months. We downloaded this data on 08/13/ The Top Papers data consists of 100 webpages, with 100 papers per page. Among other data, for each paper these webpages contain the number of downloads (overall and 12 mo). Table 5 gives summaries for and its log. Figure 6 plots densities and histograms for ln (overall and 12 mo). The results are qualitatively similar to those for ln( ); see Table 2 and Figure 2. Let be the number of papers in our data. Then the SSRN paper rank is given by (where the rank is computed across all papers in the Top Papers data) = +1 rank( ) (6) 9 There are more of what can be deemed as outliers in Figure 5, lower-right corner: P. Fernandez (139,290), M.C. Jensen (66,244), M.O. Jackson (41,580), M.T. Faber (37,553), A. Damodaran (34,523), C.R. Harvey (32,171), H.M. Mialon (32,072), E.F. Fama (32,060), C.R. Sunstein (31,681), B. Bartlett (31,583) and A.M. Francis (31,517), with the total number of downloads in the last 12 months given in the parentheses. 10 Unlike the Top Authors data, the Top Papers data does not appear to be available by (any) discipline. 11 Accessing this data beyond the top 10 papers requires an SSRN account login. The downloaded webpages state that the data was last updated on 08/09/

7 We plot v. ln and ln( ) v. ln in Figure 7 (overall and 12 mo). Let us start with the overall downloads. As above, except for several top outliers, 12 the lower-left curve in Figure 7 is almost parabolic. A quadratic curve fits the data very well indeed. The results for the fit using a polynomial regression are given in Table 6. So, we have the empirical formula (4) (where is replaced by and is replaced by ) with the numeric coefficients 4.375, 1.952, The inflection point in the upper-left curve in Figure 7 occurs around 650. The results for the quadratic fit for the 12-months downloads are summarized in Table 7, so we have 16.78, 1.331, in the formula (4). 14 However, the curvature is negligible, so we can use a linear fit instead by setting = 0 in Eq. (4), for which the results are provided in Table 8, and we have ln + ln with 17.95, Data from SSRN Author Pages As mentioned above, the Top Authors data contains links to the 30,000 authors individual webpages. We downloaded these webpages in an automated fashion on 08/16/2015 and 08/17/ The data is essentially structured, with a few caveats. E.g., the Posted: date is not always shown, which complicates parsing. Also, the default ordering of the papers is by the decreasing number of downloads; however, occasionally this ordering is not followed with no clear pattern. Furthermore, papers that have been revised and are still under review by SSRN are moved to the bottom of the list with no Last Revised: date. However, each webpage has a field showing the total number of downloads, so a simple sanity check is that summing the number of paper downloads over all papers with non-zero/non-empty downloads fields should produce the total number of downloads. Out of 29,985 good webpages (see above), all but 256 satisfied this criterion with straightforward parsing. An additional heuristic further reduced this number to 78. We manually checked and patched the data for these remaining 78 pages on 08/18/2015. However, the laborious and time-consuming sourcing resulted in high quality data. 12 The following papers are the apparent outliers (the format is (authors(s), year), ): (Faber, 2007), 152,242; (Solove, 2007), 150,263; (Jensen and Meckling, 1976), 108,410; (Jackson, 2011), 96,296; (Fama, 1998), 83,174; (Girgis, George and Anderson, 2010), 73,622. We include these papers in the References so the reader can get a flavor on the spectrum of the topics and authors of the top downloaded papers (overall). 13 The coefficients and in this case are close to those for the overall total downloads (see above and Table 3). 14 The following papers are the apparent outliers (the format is (authors(s), year), ): (Francis and Mialon, 2014), ; (Jackson, 2011), 31476; (Bartlett, 2015), 30538; (Faber, 2007), 21531; (Fama and French, 2015), 12081; (Roche, 2011), Again, we include these papers in the References so the reader can get a flavor on the spectrum of the topics and authors of the top downloaded papers (12 mo). 15 The downloads take some time, even for a fast machine with a fast internet connection, which is what we used. We mention this to emphasize that the data is not 100% synchronized as, e.g., SSRN updates download counts in real-time. This asynchronicity is unavoidable with downloads; however, its effect on our analysis here is small. 7

8 For the reasons mentioned in Section 2, we drop all papers labeled as other papers (SSRN does not include downloads for such papers in the total download count), and also all papers with empty downloads fields. For each author we then have a vector, =1,, with =, same as the total number of downloads on the author s webpage. However, can be less than the total number of papers on the webpage as we omit the papers with empty downloads fields. Using this data we compute, and via Eqs. (1),(2) and (3). 3. Index Properties The number of papers (as defined above) across all authors in our database is 367,478. However, by definition, only a fraction of these papers contribute to the index : the number of such papers is simply a sum (across all authors) over the values of the integer index and turns out to be 112,793, or about 30.7%. Table 9 gives cross-sectional (across all authors in the SSRN Top Authors data) summaries for and the ratio = / (with NAs omitted). The =1 cases are rather ubiquitous, to wit, 8,572. However, these are mostly the authors with low paper counts. We have the following statistics for the number of occurrences of = according to the value: 3,326 for =1; 2,338 for =2; 1,842 for =3; 848 for =4; 205 for =5; 12 for =6; 0 for =7; 1 for =8; and 0 for =9 (see the histogram in Figure 8). The outlier = =8 corresponds to the author G. Feiger (SSRN ID ), whose =(47043,7971,7205,6438,6004,5607,5397,5276,3145). The index means that = papers have at least downloads each, i.e., we have and we can define the ratio = (7) We have 1. Summaries of and ln( ) are given in Table 9. The large values of are mostly due to the authors with low paper counts but substantial numbers of downloads. In Figure 8 we plot histograms for ln( ), and the quantity =ln( / )/ also summarized in Table 9. The significance of the quantity is that, if all (see Eq. (2)), then = and ln( ). The reason why the index works numerologically, that is, produces reasonable results, is that the bulk of the values of are of order 1. Had these values been much higher, most values of would equal and this index would be uninformative. The analog of the quantity for the h-index is / and this quantity is mostly large for SSRN downloads, which is precisely why the h-index does not work numerologically for SSRN downloads. In contrast, for citations the bulk of the values of / (here is the total number of citations) is around 3-5 for physics papers (Hirsch, 2005) focuses on, which is the reason the h-index works reasonably well numerologically for citations in that particular field. 8

9 3.1. An Alternative Index Suppose an author has index. This index knows nothing about the detailed structure of the downloads for the papers with =, only that =exp( ) (we are assuming that the papers are ordered with decreasing ). To give more weight to the papers with more downloads, we can consider an alternative index call it for the lack of a better name for SSRN downloads for a given author: =maxmin(, ) (8) = ( ) (9) = 1 (10) I.e., the integer-valued index is based on the average number of downloads for the first papers (as opposed to the number of downloads for the -th paper). 16 As in Section 1, we can define a non-integer index via (see Figure 9) =ln( )= ( +1)ln( ) 1+ln( ) (11) Here =ln( ) if < ; otherwise, =. By construction, < +1 and =. The analog of in Eq. (7) is = / ; however, the analog of is trivial: for = we have =1 as = / in this case. By construction, the number of papers that contribute to the index is higher (compared with the index ); it is simply a sum (across all authors) over the values of the integer index and turns out to be 128,494, or about 35.0%. Table 10 gives cross-sectional (across all authors in the SSRN Top Authors data) summaries for, ln( ), and the ratio = / (with NAs omitted). As above, the =1 cases are rather ubiquitous, to wit, 11,343, and mostly correspond to the authors with low paper counts. We have the following statistics for the number of occurrences of = according to the value (currently, the maximum value of is 10): 3,326 for =1; 2,395 for =2; 2,206 for =3; 1,868 for =4; 1,141 for =5; 356 for =6; 42 for =7; 8 for =8; 1 for =9; and 0 for =10 (see the histogram in Figure 10). The outlier = =9 corresponds to the author S. Zafron (SSRN ID ), whose 9 papers have the downloads vector =(21609,21148,17194,11034,8930,1308,1134,256,231). Table 11 lists the top 20 authors by the index (cf. Table 1 for the top 20 authors by the index ). 16 This is analogous to the -index (Egghe, 2006) for citations. Here we have (the integer part of) the log in Eq. (9), which makes all the difference. Just as the h-index, the -index is uninformative when applied to SSRN downloads. 9

10 3.2. Are Our Indexes Informative? One of the critiques of the h-index was set forth in (Yong, 2014). In a nutshell, it boils down to the fact that we can think of citations being partitioned into papers, and then the h- index is the side-length of the so-called Durfee square (which is the largest h h square that fits into the so-called Young diagram for said partition (see, e.g., (Anderson, Hankin and Killworth, 2008)). For given and there is a finite range of what values the h-index can take. Assuming equal probabilities (i.e., no additional information) we can define the expected value of the h-index. When is large, there is an asymptotic formula for this expected value (Canfield, Corteel and Savage, 1998), which Yong (2014) proposes to use as the rule-of-thumb estimate for the h-index: h=( 6ln(2)/ ) Yong then argues that the information in the h-index beyond what is already in is limited. In this regard, here we also should ask whether our indexes and are informative (beyond what is encoded in ). Since our indexes are based on logs of the numbers of downloads, the combinatorial tricks do not appear to be directly applicable. We will therefore take an empirical approach. We plot the indexes and as well as and v. ln( ) in Figure 11. There is an apparent linear ln( ) component in these indexes. Linear regressions of the indexes over ln( ) with the intercept are summarized in Tables Adding another explanatory variable ln( ) improves the fits see Table 16. Evidently, the dependence on ln( ) is not the end of the story: there is more information encoded in these indexes beyond what is already in ln( ) Twelve-months Indexes One shortcoming of the h-index is that, for a given author, it cannot decrease. A retired author can have a high h-value without writing a single new paper. By definition, the same applies to our indexes. In the case of the h-index, one can implement a weighting scheme, whereby older papers are given less weight (Sidiropoulos, Katsaros and Manolopoulos, 2007). The same idea can be applied to our indexes. However, the data for the age of the downloads is not readily available, at least not publically, so any empirical analysis presently is out of reach. Nonetheless, not all is lost. We can compute our indexes for the last 12-months downloads. The author webpages do not separately provide the last 12-mo download data. We circumvent this difficulty by utilizing the Top Papers data, which contains the top 10,000 most downloaded papers for the last 12 months together with their last 12-mo download numbers. 17 For each 17 It also contains data for the overall downloads. However, the Top Papers data is ordered by the rank based on the last 12-mo downloads. This implies that this data may not contain the overall top 10,000 most downloaded papers. The same applies to the Top Authors data, which is also ordered by the rank based on the last 12-mo downloads. However, it is not unreasonable to focus on the papers that have been downloaded more recently. 10

11 author contributing to these 10, papers we can extract the author s papers with the last 12-mo download numbers and therefore compute our indexes. Summaries for the 12-mo and index values are given in Table 17. Tables 18 and 19 give the top 20 authors by the 12- mo and index values, respectively. The numbers of papers (column 6) in Tables 18 and 19 are lower than those in Table 1. It is not surprising that some papers do not make it to the top 10,000 most downloaded papers. This causes the total numbers of downloads in the last 12 months in Tables 18 and 19 to be lower than those reported in the Top Authors data (not shown), so the 12-mo rank in Tables 18 and 19 (column 7), which is based on the total numbers of downloads in Tables 18 and 19 (column 5), is not always the same as the 12-mo rank in the Top Authors data. However, we expect that the omitted low-downloads papers hardly affect the index values. The rank is not critical here and is shown solely for comparison purposes. We can analyze the time-dependence of our indexes in more detail using the SSRN author webpage data. It contains the Posted: field (with some missing cases see above), the date a paper was originally posted on SSRN. We parsed the 30,000 author webpages (see above) and for each author identified the earliest of the Posted: dates, which we use to measure, the authors SSRN career lengths in years (for simplicity we set 1 mo = 1/12 yr, 1 day = 1/30 mo). Plots of ln( ), ln( / ), and v. ln( ) are given in Figure There is no statistically significant relation between ln( / ) and ln( ); see Table On average, there is linear growth in ln( ), and (at the upper end of the values) with ln( ), which is expected; see Table 20. Adding ln( ) as a third explanatory variable in the regressions in Table 16, however, has a negligible effect on the fits. The ln( ) dependence is not the main driver Why Do Our Indexes Work Numerologically? In our definition of the index in Eqs. (1) and (2) (and, consequently, in our definitions of the indexes as well as and ) we chose the natural logarithm ln( ) as opposed to a logarithm log ( ) with another base. How come? The answer is rather prosaic. We chose the natural logarithm because it works well numerologically. Let us elaborate on this point. 18 One paper does not list the author(s) in Top Papers, so we end up with 9,999 papers with 11,871 authors. 19 We downloaded the 30,000 SSRN author webpages to extract the Posted: fields on August 28-29, This does not affect the actual values of ; however; there were more bad webpages, 27 instead of 15 (see above). 20 So that the statistic is meaningful, we take 1. After dropping NAs (see above), we have 28,552 datapoints. The summary for reads: Min = 1.003, 1st Quartile = 4.111; Median = 7.467; Mean = 8.415; 3rd Quartile = ; Max = The maximum corresponds to the author J. Pontiff (SSRN ID 17153), with a posting on 5/9/ Cf. the so-called -quotient (the h-index over the number of years); see (Hirsch, 2005). 11

12 Suppose we have objects (e.g., papers), each of which is characterized by a count of sorts (e.g., a number of citations or downloads, etc.). Let us call these counts, =1,,. Let us further assume that the counts are exponential by nature (just as is the case with downloads), i.e., the cross-sectional distribution of the total counts = across the object owners (e.g., authors) is (quasi-)log-normal. Suppose we wish to construct an index along the lines of our index. We can define this index via Eq. (1) with defined more generally as follows: 22 = ( ) (12) Here is an overall normalization factor, which for SSRN downloads in Eq. (2) we have set to 1 for the reasons we will explain momentarily. More generally, need not be 1. The choice of the base in the logarithm is then subsumed in as log ( )=ln( ) / ln( ). I.e., the choice of the base of the logarithm is equivalent to the choice of the overall normalization factor. In the context of the h-index this is analogous to the h -index of (Van Eck and Waltman, 2008); also see (Waltman and Van Eck, 2009). Our in Eq. (12) is analogous to in the h -index. So, what should we choose as our factor? There is no magic prescription here. There are two evident guiding principles: i) that the resulting index (and also and all the other related indexes) should be informative, and ii) simplicity. E.g., if we choose too high, the index will mostly equal the number of objects and thereby be uninformative. If we choose too low, then the index will mostly equal 0 or 1 and thereby also be uninformative. A choice of that avoids such extremes is such that the bulk of the values of the product is of order 1, where, as above, =ln( / ) /. E.g., we can set =1 / median( ), albeit this is not the only choice. For the overall SSRN downloads the bulk of the values of is of order 1 (see Table 9), which is why we have chosen =1 in Eq. (2), or, equivalently, the natural logarithm and not any other base. As mentioned above, this is the numerological reason why our indexes work well for the overall SSRN downloads. Also, while, e.g., median( ) 0.7 (see Table 9), we have chosen =1 based on a further consideration of simplicity, so that only each author s data is required to compute his/her indexes (but no cross-sectional data across a sample of authors). Here the following remark is in order. Basing on the quantity =ln( / ) / makes sense only if ln( / ) is essentially normally distributed and the data is not inundated with =1 (and, more generally, low ) datapoints. Our dataset based on the Top Authors data satisfies these criteria: as we discussed above, the density of ln( / ) for the overall downloads is almost Gaussian (see Figure 4), among the 29,979 datapoints (there are 29,985 good webpages (see above), 6 of which contain no papers) there are only 3,326 papers with =1 and 2,399 papers with =2, and the paper count statistics is reasonable (see 22 Let us note that rescaling by some factor would merely shift the range of values of the index. 12

13 Table 2). If ln( / ) itself has a highly skewed distribution or the data mostly contains low points, then blindly relying on =ln( / ) / would produce nonsensical results. E.g., median( ) 5.3 for the 9,999 papers discussed in Subsection 3.3 based on the Top Papers data, despite the fact that the bulk of the 12-mo download numbers are roughly 5-7 times less numerous than the bulk of the overall download numbers. This is due to the fact that the majority of the 11,871 authors of these 9,999 papers have =1 and =2 (see Table 17). If we remove the =1 and =2 datapoints, happily we are left with only 1,310 authors with the bulk of the values of of order 1 (to wit, median( ) 1.4) Can We Apply Our Indexes to Citations? As mentioned above, even citations are essentially exponential by nature. In this regard, we believe it would make sense to apply our ideas here to citations as well. This interesting in its own right topic is outside of the scope of this paper, so we will not delve into it too deeply and only give a bird s-eye view. A detailed empirical analysis would be required to see if it works. If we apply, say, the index as defined in Eqs. (1) and (2) directly to citations, it may not work as well numerologically. This is because the numbers of citations are a few orders of magnitude lower than the numbers of SSRN downloads. Thus, as of 9/4/2015, M.C. Jensen has the most SSRN downloads, to wit, 830,936, while according to INSPIRE (see above) E. Witten s (high energy physics) total number of citations is 118,374 with a total of 332 citable papers. 23 As of 9/4/2015, (Faber, 2007) is the most downloaded SSRN paper with 158,804 downloads, while the most cited paper in high energy physics (per INSPIRE) is (Maldacena, 1997) with 10,996 citations. As is customary in high energy physics, we exclude (Particle Data Group Collaboration, 2014) with 50,004 citations as of the end of 2014, which is a handbook of elementary particles and traditionally garners most citations in high energy physics year after year. Based on the above numbers, we can superficially estimate that there is roughly orders of magnitude difference between SSRN downloads and citations, albeit a more detailed analysis (which is outside of the scope of this paper) would be required to get more precise bulk numbers. In any event, if we apply our indexes with =1 to citations, we can expect that they will produce reasonable results at the higher end (i.e., for highly cited authors and papers), and a nontrivial might be required to have informative indexes for lower citation count trenches. Our log-based indexes should be applicable beyond SSRN downloads and citations, e.g., for internet media downloads, which apparently are also exponential in nature (cf. (Hoven, 2013)). However, depending on a type of downloads and the bulk values thereof, the factor in Eq. (12) may have to be chosen away from 1 for our indexes to work well numerologically. 23 Using Harzing s Publish or Perish software (version ) with Google Scholar as the data source gives inflated figures: 161,969 citations and 574 papers. In our experience, INSPIRE does undercount citations, though. 13

14 4. Conclusions Let us start by tying up a lose end, so to speak. In the empirical regression in Table 6 we used the Top Papers data, which only contains 10,000 datapoints. We did so for illustrative purposes as downloading this data, which amounts to downloading only 100 webpages, is much less arduous than downloading 30,000 individual author webpages. However, the latter already contain the data (much more of it, 367,478 papers) required in the regression in Table 6. We give the results for this regression based on the author webpage data in Table 21, which are qualitatively similar to the results in Table 6, and we still have the empirical formula (4). 24 As mentioned in Table 17, the Top Papers data sample is actually small, despite 9,999 papers it contains, so any results obtained using the Top Papers data should be taken with a grain of salt. Let us now discuss possible variations and generalizations of our indexes and this topic invariably overlaps with caveats. E.g., as in the case of the h-index, typical values of our indexes naturally will vary from discipline to discipline. One way to deal with this is to simply compute the indexes separately for each discipline. As mentioned above, SSRN provides the Top Authors (but not the Top Papers) data broken down by some disciplines (law, business and economics), with such data for other disciplines (to wit, accounting and finance) apparently forthcoming. It would be interesting to statistically analyze our indexes by each discipline. 25 One way the disparity in the h-index across different disciplines has been dealt with is via normalizing the h-index (or the citations) within each discipline (e.g., via dividing by a mean for the discipline). 26 A similar approach can be applied to our indexes as well. So, can we simply rescale SSRN downloads by some factor and apply the h-index to the rescaled downloads (as, e.g., for the internet media in (Hovden, 2013))? A natural choice for such a factor is, e.g., the median of = / (the distribution is too skewed to use the mean, which is much higher), which is median( ) (based on the Top Authors data). However, if we apply the h-index to = / 30, expectedly, we get an index with a highly skewed distribution. There is no escaping the fact that SSRN downloads are exponential by nature as we hopefully convincingly argued above based on the empirical data. Logs of the numbers of 24 With replaced by and replaced by (see Table 21). The inflection point in the curve v. ln( ) occurs at around 70. Let us note that the author webpage data contains the numbers of overall downloads for each paper by each of the 30,000 authors, but not for the last 12 months. So, we have no choice but to use to the Top Papers data for the regressions in Tables 7 and We have refrained from doing so here as the breakdown by all disciplines is currently unavailable. Let us note that the author webpage data can also be split by discipline as the SSRN IDs are obtained via the Top Authors data. 26 See, e.g., (Anauati, Galiani, and Gálvez, 2014), (Batista et al, 2006), (Bornmann and Daniel, 2008), (Iglesias and Pecharromán, 2007), (Kaur, Radicchi and Menczer, 2013), Podlubny, 2005), (Podlubny and Kassayova, 2006). 14

15 downloads not the numbers of downloads are a natural measure. And this is essentially our key observation. Once we accept this fact, the indexes we propose are natural, with possible tweaks (e.g., the linear extrapolation between the points (, ln( )) and ( +1, ln( )) in Eq. (3) can be tweaked, but this is all minutiae). Just as the h-index, our indexes use only each author s data, but no cross-sectional (across a sample of authors) data, which makes them easy to compute. Rescaling by some factor that requires cross-sectional data to compute would complicate the calculation of an index. And, once again, such rescaling does nothing to address the exponential nature of SSRN downloads, while our indexes are built on that very premise. Several variations of and indexes complementary to the h-index have been proposed, e.g., the -index (Egghe, 2006), whose analog for SSRN downloads is our index. A geometric mean of the h-index and the -index the so-called h -index (Alonso et al., 2010) is a composite index (also, see, e.g., (Franceschini and Maisano, 2011)). An evident application to our indexes would be to consider geometric means of the and or and indexes. Another point worth mentioning relates to the overall normalization factor we introduced in Subsection 3.4. In Eq. (12) it is implicitly assumed to be a constant. However, there is no reason (other than forgoing simplicity, that is) why we could not consider non-constant ( ) in Eq. (12) instead, where ( ) is some in many cases, likely relatively slowly varying function. Self-citations affect the total number of citations as well as the h-index. For citations this is relatively simple to deal with: one can simply remove self-citations. E.g., INSPIRE provides such functionality. Analogously, self-downloads can be a nuisance for SSRN downloads (see, e.g., (Edelman and Larkin, 2014)) and thereby our indexes. Dealing with self-downloads is harder, perhaps even internally at SSRN. This is a caveat. Also, some papers are not available for download from SSRN (due to journal policies see above with exceptions often granted to established authors). Naturally, as with any index, there are caveats. Nonetheless, our indexes are novel and it would be interesting if SSRN could analyze and perhaps even implement them. Acknowledgments I would like to thank Ludo Waltman for reading a draft of the manuscript and valuable comments and suggestions that have helped improve it, and Blaise Cronin for encouragement. References Alonso, S.; Cabrerizo, F.; Herrera-Viedma, E.; Herrera, F. (2010) h-index: A review focused in its variants, computation and standardization for different scientific fields. Journal of Informetrics 3(4):

16 Alonso, S.; Cabrerizo, F.; Herrera-Viedma, E.; Herrera, F. (2010) hg-index: a new index to characterize the scientific output of researchers based on the h- and g-indices. Scientometrics 82(2): Anderson, T.; Hankin, R; Killworth, P. (2008) Beyond the Durfee square: Enhancing the h-index to score total publication output. Scientometrics 76(3): Anauati, M.V.; Galiani, S.; Gálvez, R.H. (2014) Quantifying the Life Cycle of Scholarly Articles Across Fields of Economic Research. Available at SSRN: Bartlett, B. (2015) How Fox News Changed American Media and Political Dynamics. Available at SSRN: Batista, P.D.; Campiteli, M.G.; Konouchi, O.; Martinez, A.S Is it possible to compare researchers with different scientific interests? Scientometrics 68(1): Bornmann, L.; Daniel, H. (2007) Convergent validation of peer review decisions using the h index: Extent of and reasons for type I and type II errors. Journal of Informetrics 1(3): Bornmann, L.; Daniel, H. (2008) What do citation counts measure? A review of studies on citing behavior. Journal of Documentation 64(1): Bornmann, L..; Mutz, R.; Daniel, H. (2008) Are there better indices for evaluation purposes than the h index? A comparison of nine different variants of the h index using data from biomedicine. Journal of the American Society for Information Science and Technology 59(5): Canfield, E.R; Corteel, S.; Savage, C.D. (1998) Durfee polynomials. The Electronic Journal of Combinatorics 5 (1998) #R32. Edelman, B.G.; Larkin, I. (2014) Social Comparisons and Deception Across Workplace Hierarchies: Field and Experimental Evidence. Organization Science (Forthcoming). Available at SSRN: Deineko, V.G.; Woeginger, G.J. (2009) A new family of scientific impact measures: The generalized Kosmulski-indices. Scientometrics 80(3): Egghe, L Theory and practice of the g-index. Scientometrics 69(1): Egghe, L. (2008). Mathematical theory of the h- and g-index in case of fractional counting of authorship. Journal of the American Society for InformationScience and Technology 59(10):

17 Egghe, L The Hirsch index and related impact measures. Annual Review of Information Science and Technology 44(1): Egghe, L.; Rousseau, R. (2008) An h-index weighted by citation impact. Information Processing & Management 44(2): Faber, M.T. (2007) A Quantitative Approach to Tactical Asset Allocation. Journal of Wealth Management 9(4): Available at SSRN: Fama, E.F. (1998) Market Efficiency, Long-Term Returns, and Behavioral Finance. Journal of Financial Economics 49(3): Available at SSRN: Fama, E.F.; French, K.R. (2015) A Five-Factor Asset Pricing Model. Journal of Financial Economics 116(1): Available at SSRN: Franceschini, F.; Maisano, D. (2011) Criticism on the hg-index. Scientometrics 86(2): Francis, A.M.; Mialon, H.M. (2014) A Diamond is Forever and Other Fairy Tales: The Relationship between Wedding Expenses and Marriage Duration. Available at SSRN: Girgis, S.; George, R.; Anderson, R.T. (2010) What is Marriage? Harvard Journal of Law and Public Policy 34(1): Available at SSRN: Hirsch, J.E. (2005) An index to quantify an individual s scientific research output. Proceedings of the National Academy of Sciences 102(46): Hovden, R. (2013) Bibliometrics for Internet media: Applying the h-index to YouTube. Journal of the American Society for Information Science and Technology 64(11): Hua, P.-h.; Rousseau, R.; Sun X.-k.; Wan, J.-k. (2009) A Download h ( ) -Index as a Meaningful Usage Indicator of Academic Journals. In: Larsen, B.; Leta, J. (eds.) ISSI th International Conference on Scientometrics and Informetrics. Rio de Janeiro: BIREME and Federal University of Rio de Janeiro, pp Iglesias, J.E.; Pecharromán, C Scaling the h-index for Different Scientific ISI Fields. Scientometrics 73(3): pp Imperial, J.; Rodriguez-Navarro, A. (2007) Usefulness of Hirsch s h-index to evaluate scientific research in Spain. Scientometrics 71(2): Jackson, M.O. (2011) A Brief Introduction to the Basics of Game Theory. Available at SSRN: 17

18 Jarvelin, K.; Persson, O. (2008) The DCI index: discounted cumulated impact-based research evaluation. Journal of the American Society for Information Science and Technology 59(9): Jensen, M.C.; Meckling, W.H. (1976) Theory of the Firm: Managerial Behavior, Agency Costs and Ownership Structure. Journal of Financial Economics 3(4): Available at SSRN: Jin, B. (2006) H-Index: An evaluation indicator proposed by scientist. Science Focus 1(1): 8-9. Jin, B. (2007) The AR-index: Complementing the h-index. ISSI Newsletter 3(1): 6. Jin, B.H.; Liang, L.L.; Rousseau, R.; Egghe, L. (2007) The - and -indices: Complementing the h-index. Chinese Science Bulletin 52(6): Kaur, J.; Radicchi, F.; Menczer, F. (2013) Universality of scholarly impact metrics. Journal of Informetrics 7(4): Kosmulski, M. (2006) A new Hirsch-type index saves time and works equally well as the original h-index. International Society for Scientometrics and Informetrics Newsletter 2(3): 4-6. Levitt, J.M.; Thelwall, M. (2007) Two new indicators derived from the h-index for comparing citation impact: Hirsch frequencies and the normalized Hirsch index. In: Torres-Salinas, D; Moed, H.F. (eds.) Proceedings of the 11th Conference of the International Society for Scientometrics and Informetrics, Vol 2. Madrid, Spain: Spanish Research Council (CSIC), pp Lundberg, J. (2007) Lifting the crown citation z-score. Journal of Informetrics 1(2): Maldacena, J.M. (1997) The Large N limit of superconformal field theories and supergravity. (Nov. 1997); Adv. Theor. Math. Phys. 1998, 2(2): ; Int. J. Theor. Phys. 1999, 38(4): Norris, M.; Oppenheim, C. (2010) The h-index: a broad review of a new bibliometric indicator. Journal of Documentation 66(5): Particle Data Group Collaboration (Olive, K.A. et al.) (2014) Review of Particle Physics. Chin. Phys. C38: , 1676 pp. Podlubny, I Comparison of scientific impact expressed by the number of citations in different fields of science. Scientometrics 64(1):

19 Podlubny, I.; Kassayova, K Law of the constant ratio. Towards a better list of citation superstars: compiling a multidisciplinary list of highly cited researchers. Research Evaluation 15(3): Roche, C.O. (2011) Understanding the Modern Monetary System. Available at SSRN: Rousseau, R. (2006) Simple models and the corresponding h- and g-index. Available at E-LIS: Rousseau, R. (2014) A note on the interpolated or real-valued h-index with a generalization for fractional counting. Aslib Journal of Information Management 66(1): Rousseau, R.; Ye, F.Y. (2008). A proposal for a dynamic h-type index. Journal of the American Society for Information Science and Technology 59(11): Ruane, F.; Tol, R.S.J. (2008) Rational (successive) h-indices: An application to economics in the Republic of Ireland. Scientometrics 75(2): Schreiber, M. (2008) A modification of the h-index: The h -index accounts for multi-authored manuscripts. Journal of Informetrics 2(3): Sidiropoulos, A.; Katsaros, C.; Manolopoulos, Y Generalized h-index for Disclosing Latent Facts in Citation Networks. Scientometrics 72(2): Solove, D.J. (2007) I ve Got Nothing to Hide and Other Misunderstandings of Privacy. San Diego Law Review 44(4): Available at SSRN: Stringer, M.J.; Sales-Pardo, M.; Amaral, L.A.N. (2008) Effectiveness of Journal Ranking Schemes as a Tool for Locating Information. PLoS One 3(2): e1683. Van Eck, N.J.; Waltman, L. (2008) Generalizing the h- and -indices. Journal of Informetrics 2(4): Waltman, L.; Van Eck, N.J. (2009) A simple alternative to the h-index. ISSI Newsletter 5(3): Yong, A. (2014) Critique of Hirsch s Citation Index: A Combinatorial Fermi Problem. Notices of the American Mathematical Society 61(9):

20 Tables Author Name, SSRN ID Total # of Downloads # of >0 Papers SSRN Rank Michael C. Jensen, Eugene F. Fama, Pablo Fernandez, Kenneth R. French, Attilio Meucci, Aswath Damodaran, William N. Goetzmann, Lucian A. Bebchuk, Shahin Shojai, Kostas Koufopoulos, Andrew W. Lo, Daniel Kaufmann, Nassim Nicholas Taleb, Werner Erhard, Christian Leuz, Stephen H. Penman, Bernard S. Black, Aart Kraay, Ignacio Velez-Pareja, Campbell R. Harvey, Table 1. Top 20 SSRN authors by the index. The 6th column is the number of papers with at least 1 download. The 2nd column is rounded down to the 3rd decimal. The 4th column is rounded down to the nearest integer. The SSRN rank is based on the total number of downloads. All statistics are as of the date(s) of our downloads of the data (see Section 2). 20

21 Quantity Min. 1 st Quartile Median Mean 3 rd Quartile Max. (overall) ln( )(overall) (overall) ln( ) (overall) / (overall) ln( / ) (overall) (12 mo) ln( ) (12 mo) (12 mo) ln( ) (12 mo) / (12 mo) ln( / ) (12 mo) / ln( / ) Table 2. Cross-sectional (across all authors in the SSRN Top Authors data) summaries for the total number of downloads and its log, the total number of papers and its log, and downloads-per-paper / and its log, both overall and for the last 12 months. In for the last 12 months we drop all =0 cases. We give the numbers as rounded by R. E.g., the maximum numbers of downloads overall and in the last 12 months actually are and , respectively. The / figures are already rounded to the nearest integer in the SSRN Top Authors data. The bottom two rows summarize the ratio of the overall total number of downloads to the 12-mo total number of downloads and the log of this ratio. Regression: ln( ) ~ ln( )+ln ( ) Estimate Standard error t-statistic Overall statistics Intercept ln( ) ln ( ) Multiple/Adjusted R-squared F-statistic Table 3. Summary (using the function summary(lm())in R) for the cross-sectional (over all authors in the SSRN Top Authors data) polynomial regression of ln( ) over ln( ) and ln ( ) with the intercept. The regression formula reads lm(y ~ x + I(x^2)) in R notations, where =ln( ) and =ln( ). Here the rank and are based on the overall downloads. We keep the outliers in the regression not to inflate the statistics. 21

22 Regression: Estimate Standard t-statistic Overall ln( ) ~ ln( )+ln ( ) error statistics Intercept ln( ) ln ( ) Multiple/Adjusted R-squared F-statistic Table 4. Same as Table 3 with the rank and based on the last 12-months downloads. Quantity Min. 1 st Quartile Median Mean 3 rd Quartile Max. (overall) ln( )(overall) (12 mo) ln( ) (12 mo) Table 5. Cross-sectional (across all papers in the SSRN Top Papers data) summaries for the number of downloads and its log, both overall and for the last 12 months. We give the numbers as rounded by R. E.g., the maximum numbers of downloads overall and in the last 12 months actually are and 31862, respectively. Regression: ln ~ ln +ln ( ) Estimate Standard error t-statistic Overall statistics Intercept ln ln ( ) Multiple/Adjusted R-squared F-statistic Table 6. Summary (using the function summary(lm())in R) for the cross-sectional (over all papers in the SSRN Top Papers data) polynomial regression of ln( ) over ln and ln ( ) with the intercept. The regression formula reads lm(y ~ x + I(x^2)) in R notations, where =ln( ) and =ln. Here the rank and are based on the overall downloads. We keep the outliers in the regression not to inflate the statistics. 22

23 Regression: ln ~ ln +ln ( ) Estimate Standard t-statistic error Intercept ln ln ( ) Overall statistics Multiple/Adjusted R-squared F-statistic Table 7. Same as Table 6 with the rank and based on the last 12-months downloads. Regression: ln ~ ln Estimate Standard t-statistic error Intercept ln Overall statistics Multiple/Adjusted R-squared F-statistic Table 8. Summary (using the function summary(lm())in R) for the cross-sectional (over all papers in the SSRN Top Papers data) linear regression of ln( ) over ln with the intercept. The regression formula reads lm(y ~ x) in R notations, where =ln( ) and =ln. Here the rank and are based on the last 12-months downloads. We keep the outliers in the regression not to inflate the statistics. Quantity Min. 1 st Quartile Median Mean 3 rd Quartile Max / ln( ) =ln( / ) / Table 9. Cross-sectional (across all authors in the SSRN Top Authors data) summaries for and the ratio / together with the factor (see Eq. (7)), its log and. NAs are omitted. Quantity Min. 1 st Quartile Median Mean 3 rd Quartile Max / ln( ) Table 10. Same as Table 9 for the indexes and (except there is no nontrivial analog of in this case see Subsection 3.1). The factor is defined in Subsection

24 Author Name, SSRN ID Total # of Downloads # of >0 Papers SSRN Rank Michael C. Jensen, Eugene F. Fama, Pablo Fernandez, Daniel J. Solove, Kenneth R. French, William H. Meckling, Daniel Kaufmann, Aart Kraay, Massimo Mastruzzi, Nassim Nicholas Taleb, Andrew Metrick, Werner Erhard, John R. Lott, Mathew O. Jackson, Kostas Koufopoulos, Stephen H. Penman, William N. Goetzmann, Attilio Meucci, Aswath Damodaran, K. Geert Rouwenhorst, Table 11. Top 20 SSRN authors by the index. The 6th column is the number of papers with at least 1 download. The 2nd column is rounded down to the 3rd decimal. The 4th column is rounded down to the nearest integer. The SSRN rank is based on the total number of downloads. All statistics are as of the date(s) of our downloads of the data (see Section 2). Regression: ~ ln( ) Estimate Standard t-statistic error Intercept ln( ) Overall statistics Multiple/Adjusted R-squared F-statistic Table 12. Summary (using the function summary(lm())in R) for the cross-sectional (over all authors in the SSRN Top Authors data) linear regression of over ln( ) with the intercept. 24

25 Regression: ~ ln( ) Estimate Standard t-statistic error Intercept ln( ) Overall statistics Multiple/Adjusted R-squared F-statistic Table 13. Summary (using the function summary(lm())in R) for the cross-sectional (over all authors in the SSRN Top Authors data) linear regression of over ln( ) with the intercept. Regression: ~ ln( ) Estimate Standard t-statistic error Intercept ln( ) Overall statistics Multiple/Adjusted R-squared F-statistic Table 14. Summary (using the function summary(lm())in R) for the cross-sectional (over all authors in the SSRN Top Authors data) linear regression of over ln( ) with the intercept. Regression: ~ ln( ) Estimate Standard t-statistic error Intercept ln( ) Overall statistics Multiple/Adjusted R-squared F-statistic Table 15. Summary (using the function summary(lm())in R) for the cross-sectional (over all authors in the SSRN Top Authors data) linear regression of over ln( ) with the intercept. 25

26 Regression: = = = = ~ ln( )+ln( ) Estimate: Intercept Estimate: ln( ) Estimate: ln( ) Standard Error: Intercept Standard Error: ln( ) Standard Error: ln( ) t-statistic: Intercept t-statistic: ln( ) t-statistic: ln( ) Multiple/Adjusted R-squared F-statistic Table 16. Summaries (using the function summary(lm())in R) for the cross-sectional (over all authors in the SSRN Top Authors data) linear regressions of over the two explanatory variables ln( ) and ln( ) with the intercept, where =,,,. Cf. Tables Quantity Min. 1 st Quartile Median Mean 3 rd Quartile Max. (12 months) (all) (12 months) (all) (12 months) ( >1) (12 months) ( >1) (12 months) ( >2) (12 months) ( >2) Table 17. Cross-sectional (across all authors in the SSRN Top Papers data) summaries for the and indexes based on the last 12-months downloads obtained using the Top Papers data. Since we only have 9,999 papers with 11,871 authors (see Subsection 3.3), i.e., the data sample is in fact small despite a large number of papers, the data mostly has authors with only 1 or 2 papers. This causes the bulk of the index values to be artificially low (and the primary cause of this is not the fact that the bulk of the 12-mo download numbers is lower than the bulk of the overall download numbers by roughly a factor of 5-7; see the bottom two rows in Table 2). Therefore, we provide summaries for all 11,871 authors, the 2,834 authors with >1 papers, and the 1,310 authors with >2 papers. 26

27 Author Name, SSRN ID Total # of Downloads (12 mo) # of Papers (12 mo) Downloads Rank (12 mo) Pablo Fernandez, Michael C. Jensen, Aswath Damodaran, Attilio Meucci, Eugene F. Fama, Tobias J. Moskowitz, Campbell R. Harvey, Werner Erhard, Isabel Fernández Acín, Kenneth R. French, Lasse Heje Pedersen, Dan M. Kahan, Matthew O. Jackson, Cass R. Sunstein, George Serafeim, Clifford S. Asness, John R. Graham, Kari L. Granger, Guofu Zhou, Wade D. Pfau, Table 18. Top 20 SSRN authors by the index based on the last 12-months downloads obtained using the Top Papers data. See Table 1 for number rounding and other information. 27

28 Author Name, SSRN ID Total # of Downloads (12 mo) # of Papers (12 mo) Downloads Rank (12 mo) Pablo Fernandez, Matthew O. Jackson, Mebane T. Faber, Michael C. Jensen, Aswath Damodaran, Tobias J. Moskowitz, Eugene F. Fama, Clifford S. Asness, Kenneth R. French, Campbell R. Harvey, Isabel Fernández Acín, Pablo Linares, Lasse Heje Pedersen, Wade D. Pfau, Werner Erhard, Andrea Frazzini, Daniel J. Solove, Cass R. Sunstein, Attilio Meucci, Kari L. Granger, Table 19. Top 20 SSRN authors by the index based on the last 12-months downloads obtained using the Top Papers data. See Table 1 for number rounding and other information. Regression: ~ ln( ) =ln( ) = ln( / ) = = Estimate: Intercept Estimate: ln( ) Standard Error: Intercept Standard Error: ln( ) t-statistic: Intercept t-statistic: ln( ) Multiple/Adjusted R-squared F-statistic Table 20. Summaries (using the function summary(lm())in R) for the cross-sectional (over all authors in the SSRN Top Authors data with 1) linear regressions of over ln( ) with the intercept, where =ln( ), ln( / ),,. Here is the time in (years) from the date of the author s first posting of a paper ( Posted: field) on SSRN until August 17,

29 Regression: ln ~ ln +ln ( ) Estimate Standard error t-statistic Overall statistics Intercept ln ln ( ) Multiple/Adjusted R-squared F-statistic Table 21. Same as Table 6, except all quantities are based on the author webpage data. 29

30 Figures Figure 1. This figure illustrates the computation of the, and indexes for a randomly chosen author. The sloping diamonds correspond to ln( ). The horizontal circles correspond to = ln( ) in Eq. (2). The solid straight line has slope 1 and its intersection with the horizontal lines of circles gives the value of =5. The dotted straight line with the negative slope goes through the points (, ln( )) and ( +1, ln( )) and its intersection with the solid straight line determines. In this example we have =5.710 and =

31 Figure 2. Cross-sectional (across all authors in the SSRN Top Authors data) density for ln( ) and histogram for log ( ), where is the total number of downloads for each author. Left column: overall; right column: the last 12 months. 31

32 Figure 3. Cross-sectional (across all authors in the SSRN Top Authors data) density for ln( / ) and histogram for log ( / ), where is the total number of downloads for each author, while is the number of papers. Left column: overall; right column: the last 12 months. 32

33 Figure 4. The solid line (not a Gaussian) is the same as the upper-left density curve in Figure 3 (mean = and standard deviation = based on ln( / ), with maximum value = based on the density) for overall downloads. The diamonds correspond to the leastsquares fit Gaussian curve (mean = 5.329, standard deviation = 0.961, maximum value = 0.409, all based on the fit). 33

34 Figure 5. Downloads rank and ln( ) v. ln( ). Left column: overall; right column: the last 12 months. See Subsection 2.1 for details. 34

35 Figure 6. Cross-sectional (across all papers in the SSRN Top Papers data) density for ln( ) and histogram for log, where is the number of downloads for each paper. Left column: overall; right column: the last 12 months. 35

36 Figure 7. Downloads rank and ln( ) v. ln. Left column: overall; right column: the last 12 months. See Subsection 2.2 for details. 36

37 Figure 8. Upper-left: histogram of the index; upper-right: histogram of the number of papers with =1 (where = / ); lower-left: histogram of ln( ) (where is defined in Eq. (7)); lower-right: histogram of =ln( / ) /. See Section 3 for details. 37

38 Figure 9. This figure illustrates the computation of the and indexes for the same randomly chosen author as in Figure 1. The sloping diamonds correspond to ln( ), where is the average number of downloads for the first papers (the papers are ordered decreasingly with the numbers of downloads ). The horizontal circles correspond to = ln( ) in Eq. (9). The solid straight line has slope 1 and its intersection with the horizontal lines of circles gives the value of =6. The dotted straight line with the negative slope goes through the points (, ln( )) and ( +1, ln( )) and its intersection with the solid straight line determines. In this example we have = and =

39 Figure 10. Upper-left: histogram of the index; upper-right: histogram of the number of papers with =1 (where = / ); lower-left: histogram of ln( ) (where is defined in Subsection 3.1); lower-right: density of ln( / ), where excludes all papers with empty fields (so the latter do not alter the density curve shape, cf. Figure 3). See Subsection 3.1 for details. 39

40 Figure 11. Upper-left: Index = ; upper-right: Index = ; lower-left: Index = ; lower-right: Index =. Straight lines correspond to linear fits into the data (see Tables 12-15). 40

41 Figure 12. Upper-left: ln( ) v. ln( ); upper-right: ln( / ) v. ln( ); lower-left: v. ln( ); lower-right: v. ln( ). Here is the time in (years) from the date of the author s first posting of a paper ( Posted: field) on SSRN until August 17, Also see Table

Journal of Informetrics

Journal of Informetrics Journal of Informetrics 2 (2008) 298 303 Contents lists available at ScienceDirect Journal of Informetrics journal homepage: www.elsevier.com/locate/joi A symmetry axiom for scientific impact indices Gerhard

More information

What Do We Know About the h Index?

What Do We Know About the h Index? BRIEF COMMUNICATION What Do We Know About the h Index? Lutz Bornmann ETH Zurich, Professorship of Social Psychology and Research in Higher Education, Zaehringerstr. 24, CH-8092 Zurich, Switzerland. E-mail:

More information

Journal Impact Factor Versus Eigenfactor and Article Influence*

Journal Impact Factor Versus Eigenfactor and Article Influence* Journal Impact Factor Versus Eigenfactor and Article Influence* Chia-Lin Chang Department of Applied Economics National Chung Hsing University Taichung, Taiwan Michael McAleer Econometric Institute Erasmus

More information

Coercive Journal Self Citations, Impact Factor, Journal Influence and Article Influence

Coercive Journal Self Citations, Impact Factor, Journal Influence and Article Influence Coercive Journal Self Citations, Impact Factor, Journal Influence and Article Influence Chia-Lin Chang Department of Applied Economics Department of Finance National Chung Hsing University Taichung, Taiwan

More information

The Mathematics of Scientific Research: Scientometrics, Citation Metrics, and Impact Factors

The Mathematics of Scientific Research: Scientometrics, Citation Metrics, and Impact Factors Wayne State University Library Scholarly Publications Wayne State University Libraries 1-1-2016 The Mathematics of Scientific Research: Scientometrics, Citation Metrics, and Impact Factors Clayton Hayes

More information

Relations between the shape of a size-frequency distribution and the shape of a rank-frequency distribution Link Peer-reviewed author version

Relations between the shape of a size-frequency distribution and the shape of a rank-frequency distribution Link Peer-reviewed author version Relations between the shape of a size-frequency distribution and the shape of a rank-frequency distribution Link Peer-reviewed author version Made available by Hasselt University Library in Document Server@UHasselt

More information

Predicting long term impact of scientific publications

Predicting long term impact of scientific publications Master thesis Applied Mathematics (Chair: Stochastic Operations Research) Faculty of Electrical Engineering, Mathematics and Computer Science (EEMCS) Predicting long term impact of scientific publications

More information

Analysis of bibliometric indicators for individual scholars in a large data set

Analysis of bibliometric indicators for individual scholars in a large data set Analysis of bibliometric indicators for individual scholars in a large data set Filippo Radicchi 1, and Claudio Castellano 2, 1 Departament d Enginyeria Quimica, Universitat Rovira i Virgili, Av. Paisos

More information

An Empirical Investigation of the g-index for 26 Physicists in Comparison with the h-index, the A-Index, and the R-Index

An Empirical Investigation of the g-index for 26 Physicists in Comparison with the h-index, the A-Index, and the R-Index An Empirical Investigation of the g-index for 26 Physicists in Comparison with the h-index, the A-Index, and the R-Index Michael Schreiber Institut für Physik, Technische Universität Chemnitz, 09107 Chemnitz,

More information

An axiomatic characterization of the Hirsch-index

An axiomatic characterization of the Hirsch-index Mathematical Social Sciences 56 (2008) 224 232 Contents lists available at ScienceDirect Mathematical Social Sciences journal homepage: www.elsevier.com/locate/econbase An axiomatic characterization of

More information

Quantitative Methods Chapter 0: Review of Basic Concepts 0.1 Business Applications (II) 0.2 Business Applications (III)

Quantitative Methods Chapter 0: Review of Basic Concepts 0.1 Business Applications (II) 0.2 Business Applications (III) Quantitative Methods Chapter 0: Review of Basic Concepts 0.1 Business Applications (II) 0.1.1 Simple Interest 0.2 Business Applications (III) 0.2.1 Expenses Involved in Buying a Car 0.2.2 Expenses Involved

More information

The effect of database dirty data on h-index calculation

The effect of database dirty data on h-index calculation Scientometrics (2013) 95:1179 1188 DOI 10.1007/s11192-012-0871-x The effect of database dirty data on h-index calculation Franceschini Fiorenzo Maisano Domenico Mastrogiacomo Luca Received: 31 July 2012

More information

DEPARTMENT OF ECONOMICS AND FINANCE COLLEGE OF BUSINESS AND ECONOMICS UNIVERSITY OF CANTERBURY CHRISTCHURCH, NEW ZEALAND

DEPARTMENT OF ECONOMICS AND FINANCE COLLEGE OF BUSINESS AND ECONOMICS UNIVERSITY OF CANTERBURY CHRISTCHURCH, NEW ZEALAND DEPARTMENT OF ECONOMICS AND FINANCE COLLEGE OF BUSINESS AND ECONOMICS UNIVERSITY OF CANTERBURY CHRISTCHURCH, NEW ZEALAND Article Influence Score = 5YIF divided by 2 Chia-Lin Chang, Michael McAleer, and

More information

The h-index of a conglomerate

The h-index of a conglomerate 1 The h-index of a conglomerate Ronald Rousseau 1,2, Raf Guns 3, Yuxian Liu 3,4 1 KHBO (Association K.U.Leuven), Industrial Sciences and Technology, Zeedijk 101, B-8400 Oostende, Belgium E-mail: ronald.rousseau@khbo.be

More information

2017 Scientific Ocean Drilling Bibliographic Database Report

2017 Scientific Ocean Drilling Bibliographic Database Report 2017 Scientific Ocean Drilling Bibliographic Database Report Covering records related to the Deep Sea Drilling Project, Ocean Drilling Program, Integrated Ocean Drilling Program, and International Ocean

More information

EVALUATING THE USAGE AND IMPACT OF E-JOURNALS IN THE UK

EVALUATING THE USAGE AND IMPACT OF E-JOURNALS IN THE UK E v a l u a t i n g t h e U s a g e a n d I m p a c t o f e - J o u r n a l s i n t h e U K p a g e 1 EVALUATING THE USAGE AND IMPACT OF E-JOURNALS IN THE UK BIBLIOMETRIC INDICATORS FOR CASE STUDY INSTITUTIONS

More information

Math Literacy. Curriculum (457 topics)

Math Literacy. Curriculum (457 topics) Math Literacy This course covers the topics shown below. Students navigate learning paths based on their level of readiness. Institutional users may customize the scope and sequence to meet curricular

More information

The Scope and Growth of Spatial Analysis in the Social Sciences

The Scope and Growth of Spatial Analysis in the Social Sciences context. 2 We applied these search terms to six online bibliographic indexes of social science Completed as part of the CSISS literature search initiative on November 18, 2003 The Scope and Growth of Spatial

More information

lamsade 352 Avril 2014 Laboratoire d Analyse et Modélisation de Systèmes pour l Aide à la Décision UMR 7243

lamsade 352 Avril 2014 Laboratoire d Analyse et Modélisation de Systèmes pour l Aide à la Décision UMR 7243 lamsade Laboratoire d Analyse et Modélisation de Systèmes pour l Aide à la Décision UMR 7243 CAHIER DU LAMSADE 352 Avril 2014 An axiomatic approach to bibliometric rankings and indices Denis Bouyssou,

More information

Algebra 1 Scope and Sequence Standards Trajectory

Algebra 1 Scope and Sequence Standards Trajectory Algebra 1 Scope and Sequence Standards Trajectory Course Name Algebra 1 Grade Level High School Conceptual Category Domain Clusters Number and Quantity Algebra Functions Statistics and Probability Modeling

More information

Locating an Astronomy and Astrophysics Publication Set in a Map of the Full Scopus Database

Locating an Astronomy and Astrophysics Publication Set in a Map of the Full Scopus Database Locating an Astronomy and Astrophysics Publication Set in a Map of the Full Scopus Database Kevin W. Boyack 1 1 kboyack@mapofscience.com SciTech Strategies, Inc., 8421 Manuel Cia Pl NE, Albuquerque, NM

More information

1 Bewley Economies with Aggregate Uncertainty

1 Bewley Economies with Aggregate Uncertainty 1 Bewley Economies with Aggregate Uncertainty Sofarwehaveassumedawayaggregatefluctuations (i.e., business cycles) in our description of the incomplete-markets economies with uninsurable idiosyncratic risk

More information

Algebra 2. Curriculum (524 topics additional topics)

Algebra 2. Curriculum (524 topics additional topics) Algebra 2 This course covers the topics shown below. Students navigate learning paths based on their level of readiness. Institutional users may customize the scope and sequence to meet curricular needs.

More information

Facultad de Física e Inteligencia Artificial. Universidad Veracruzana, Apdo. Postal 475. Xalapa, Veracruz. México.

Facultad de Física e Inteligencia Artificial. Universidad Veracruzana, Apdo. Postal 475. Xalapa, Veracruz. México. arxiv:cond-mat/0411161v1 [cond-mat.other] 6 Nov 2004 On fitting the Pareto-Levy distribution to stock market index data: selecting a suitable cutoff value. H.F. Coronel-Brizio a and A.R. Hernandez-Montoya

More information

Mathematics: applications and interpretation SL

Mathematics: applications and interpretation SL Mathematics: applications and interpretation SL Chapter 1: Approximations and error A Rounding numbers B Approximations C Errors in measurement D Absolute and percentage error The first two sections of

More information

Internet Appendix for Sentiment During Recessions

Internet Appendix for Sentiment During Recessions Internet Appendix for Sentiment During Recessions Diego García In this appendix, we provide additional results to supplement the evidence included in the published version of the paper. In Section 1, we

More information

Testing Problems with Sub-Learning Sample Complexity

Testing Problems with Sub-Learning Sample Complexity Testing Problems with Sub-Learning Sample Complexity Michael Kearns AT&T Labs Research 180 Park Avenue Florham Park, NJ, 07932 mkearns@researchattcom Dana Ron Laboratory for Computer Science, MIT 545 Technology

More information

Cover Page. The handle holds various files of this Leiden University dissertation

Cover Page. The handle  holds various files of this Leiden University dissertation Cover Page The handle http://hdl.handle.net/1887/39637 holds various files of this Leiden University dissertation Author: Smit, Laurens Title: Steady-state analysis of large scale systems : the successive

More information

Index I-1. in one variable, solution set of, 474 solving by factoring, 473 cubic function definition, 394 graphs of, 394 x-intercepts on, 474

Index I-1. in one variable, solution set of, 474 solving by factoring, 473 cubic function definition, 394 graphs of, 394 x-intercepts on, 474 Index A Absolute value explanation of, 40, 81 82 of slope of lines, 453 addition applications involving, 43 associative law for, 506 508, 570 commutative law for, 238, 505 509, 570 English phrases for,

More information

Computational Complexity

Computational Complexity p. 1/24 Computational Complexity The most sharp distinction in the theory of computation is between computable and noncomputable functions; that is, between possible and impossible. From the example of

More information

Econ671 Factor Models: Principal Components

Econ671 Factor Models: Principal Components Econ671 Factor Models: Principal Components Jun YU April 8, 2016 Jun YU () Econ671 Factor Models: Principal Components April 8, 2016 1 / 59 Factor Models: Principal Components Learning Objectives 1. Show

More information

Sequenced Units for Arizona s College and Career Ready Standards MA35 Personal Finance Year at a Glance

Sequenced Units for Arizona s College and Career Ready Standards MA35 Personal Finance Year at a Glance Unit 1: Prepare a Budget (20 days) Unit 2: Employment Basics (15 days) Unit 3: Modeling a Business (20 days) Unit 4: Banking Services (15 days) Unit 5: Consumer Credit (15 days) Unit 6: Automobile Ownership

More information

Application of Bradford's Law of Scattering to the Economics Literature of India and China: A Comparative Study

Application of Bradford's Law of Scattering to the Economics Literature of India and China: A Comparative Study Asian Journal of Information Science and Technology ISSN: 2231-6108 Vol. 9 No.1, 2019, pp. 1-7 The Research Publication, www.trp.org.in Application of Bradford's Law of Scattering to the Economics Literature

More information

THE H- AND A-INDEXES IN ASTRONOMY

THE H- AND A-INDEXES IN ASTRONOMY Organizations, People and Strategies in Astronomy I (OPSA I), 245-252 Ed. A. Heck,. THE H- AND A-INDEXES IN ASTRONOMY HELMUT A. ABT Kitt Peak National Observatory P.O. Box 26732 Tucson AZ 85726-6732, U.S.A.

More information

Instructor Notes for Chapters 3 & 4

Instructor Notes for Chapters 3 & 4 Algebra for Calculus Fall 0 Section 3. Complex Numbers Goal for students: Instructor Notes for Chapters 3 & 4 perform computations involving complex numbers You might want to review the quadratic formula

More information

Solutions to MAT 117 Test #3

Solutions to MAT 117 Test #3 Solutions to MAT 7 Test #3 Because there are two versions of the test, solutions will only be given for Form C. Differences from the Form D version will be given. (The values for Form C appear above those

More information

arxiv: v2 [physics.soc-ph] 27 Oct 2008

arxiv: v2 [physics.soc-ph] 27 Oct 2008 Universality of citation distributions: towards an objective measure of scientific impact. Filippo Radicchi and Santo Fortunato Complex Networks Lagrange Laboratory (CNLL, ISI Foundation, Torino, Italy

More information

Filtering bond and credit default swap markets

Filtering bond and credit default swap markets Filtering bond and credit default swap markets Peter Cotton May 20, 2017 Overview Disclaimer Filtering credit markets A visual introduction A unified state space for bond and CDS markets Kalman filtering

More information

Stat 101 L: Laboratory 5

Stat 101 L: Laboratory 5 Stat 101 L: Laboratory 5 The first activity revisits the labeling of Fun Size bags of M&Ms by looking distributions of Total Weight of Fun Size bags and regular size bags (which have a label weight) of

More information

CS1800: Sequences & Sums. Professor Kevin Gold

CS1800: Sequences & Sums. Professor Kevin Gold CS1800: Sequences & Sums Professor Kevin Gold Moving Toward Analysis of Algorithms Today s tools help in the analysis of algorithms. We ll cover tools for deciding what equation best fits a sequence of

More information

The focus of this study is whether evolutionary medicine can be considered a distinct scientific discipline.

The focus of this study is whether evolutionary medicine can be considered a distinct scientific discipline. The purpose of this study is to characterize trends in the application of evolutionary biology in health and disease. The focus of this study is whether evolutionary medicine can be considered a distinct

More information

Faculty Working Paper Series

Faculty Working Paper Series Faculty Working Paper Series No. 2, October 2007 A Bibliometric Note on Isserman s Panegyric Statistics Nikias Sarafoglou Items in this collection are copyrighted, with all rights reserved and chosen by

More information

APPLICATION OF NUMERICAL METHODS IN CIVIL ENGINEERING

APPLICATION OF NUMERICAL METHODS IN CIVIL ENGINEERING 12 February, 2018 APPLICATION OF NUMERICAL METHODS IN CIVIL ENGINEERING Document Filetype: PDF 199.36 KB 0 APPLICATION OF NUMERICAL METHODS IN CIVIL ENGINEERING Interpolation and numerical differentiation

More information

Mathematical derivation of the impact factor distribution Link Peer-reviewed author version

Mathematical derivation of the impact factor distribution Link Peer-reviewed author version Mathematical derivation of the impact factor distribution Link Peerreviewed author version Made available by Hasselt University Library in Document Server@UHasselt Reference (Published version: EGGHE,

More information

Support for UCL Mathematics offer holders with the Sixth Term Examination Paper

Support for UCL Mathematics offer holders with the Sixth Term Examination Paper 1 Support for UCL Mathematics offer holders with the Sixth Term Examination Paper The Sixth Term Examination Paper (STEP) examination tests advanced mathematical thinking and problem solving. The examination

More information

Corporate Governance, and the Returns on Investment

Corporate Governance, and the Returns on Investment Corporate Governance, and the Returns on Investment Klaus Gugler, Dennis C. Mueller and B. Burcin Yurtoglu University of Vienna, Department of Economics BWZ, Bruennerstr. 72, A-1210, Vienna 1 Considerable

More information

Taylor series. Chapter Introduction From geometric series to Taylor polynomials

Taylor series. Chapter Introduction From geometric series to Taylor polynomials Chapter 2 Taylor series 2. Introduction The topic of this chapter is find approximations of functions in terms of power series, also called Taylor series. Such series can be described informally as infinite

More information

Example 1: Dear Abby. Stat Camp for the MBA Program

Example 1: Dear Abby. Stat Camp for the MBA Program Stat Camp for the MBA Program Daniel Solow Lecture 4 The Normal Distribution and the Central Limit Theorem 187 Example 1: Dear Abby You wrote that a woman is pregnant for 266 days. Who said so? I carried

More information

arxiv: v1 [physics.soc-ph] 27 Aug 2013

arxiv: v1 [physics.soc-ph] 27 Aug 2013 The Z-index: A geometric representation of productivity and impact which accounts for information in the entire rank-citation profile Alexander M. Petersen 1 and Sauro Succi 2, 3 1 IMT Lucca Institute

More information

Supplementary Information - On the Predictability of Future Impact in Science

Supplementary Information - On the Predictability of Future Impact in Science Supplementary Information - On the Predictability of uture Impact in Science Orion Penner, Raj K. Pan, lexander M. Petersen, Kimmo Kaski, and Santo ortunato Laboratory of Innovation Management and conomics,

More information

The detection of hot regions in the geography of science. A visualization approach by using density maps

The detection of hot regions in the geography of science. A visualization approach by using density maps The detection of hot regions in the geography of science A visualization approach by using density maps Lutz Bornmann$, Ludo Waltman $ Max Planck Society, Hofgartenstr. 8, 80539 Munich, Germany Centre

More information

Regression Analysis V... More Model Building: Including Qualitative Predictors, Model Searching, Model "Checking"/Diagnostics

Regression Analysis V... More Model Building: Including Qualitative Predictors, Model Searching, Model Checking/Diagnostics Regression Analysis V... More Model Building: Including Qualitative Predictors, Model Searching, Model "Checking"/Diagnostics The session is a continuation of a version of Section 11.3 of MMD&S. It concerns

More information

Regression Analysis V... More Model Building: Including Qualitative Predictors, Model Searching, Model "Checking"/Diagnostics

Regression Analysis V... More Model Building: Including Qualitative Predictors, Model Searching, Model Checking/Diagnostics Regression Analysis V... More Model Building: Including Qualitative Predictors, Model Searching, Model "Checking"/Diagnostics The session is a continuation of a version of Section 11.3 of MMD&S. It concerns

More information

NATCOR. Forecast Evaluation. Forecasting with ARIMA models. Nikolaos Kourentzes

NATCOR. Forecast Evaluation. Forecasting with ARIMA models. Nikolaos Kourentzes NATCOR Forecast Evaluation Forecasting with ARIMA models Nikolaos Kourentzes n.kourentzes@lancaster.ac.uk O u t l i n e 1. Bias measures 2. Accuracy measures 3. Evaluation schemes 4. Prediction intervals

More information

HARNESSING THE WISDOM OF CROWDS

HARNESSING THE WISDOM OF CROWDS 1 HARNESSING THE WISDOM OF CROWDS Zhi Da, University of Notre Dame Xing Huang, Michigan State University Second Annual News & Finance Conference March 8, 2017 2 Many important decisions in life are made

More information

Computing Consecutive-Type Reliabilities Non-Recursively

Computing Consecutive-Type Reliabilities Non-Recursively IEEE TRANSACTIONS ON RELIABILITY, VOL. 52, NO. 3, SEPTEMBER 2003 367 Computing Consecutive-Type Reliabilities Non-Recursively Galit Shmueli Abstract The reliability of consecutive-type systems has been

More information

Time Series 4. Robert Almgren. Oct. 5, 2009

Time Series 4. Robert Almgren. Oct. 5, 2009 Time Series 4 Robert Almgren Oct. 5, 2009 1 Nonstationarity How should you model a process that has drift? ARMA models are intrinsically stationary, that is, they are mean-reverting: when the value of

More information

Computational Tasks and Models

Computational Tasks and Models 1 Computational Tasks and Models Overview: We assume that the reader is familiar with computing devices but may associate the notion of computation with specific incarnations of it. Our first goal is to

More information

Co-authorship networks in South African chemistry and mathematics

Co-authorship networks in South African chemistry and mathematics Research Articles South African Journal of Science 104, November/December 2008 487 Co-authorship networks in South African chemistry and mathematics Ian N. Durbach a*, Deevashan Naidoo a and Johann Mouton

More information

Calculus from Graphical, Numerical, and Symbolic Points of View Overview of 2nd Edition

Calculus from Graphical, Numerical, and Symbolic Points of View Overview of 2nd Edition Calculus from Graphical, Numerical, and Symbolic Points of View Overview of 2nd Edition General notes. These informal notes briefly overview plans for the 2nd edition (2/e) of the Ostebee/Zorn text. This

More information

Algebra I High School Math Solution West Virginia Correlation

Algebra I High School Math Solution West Virginia Correlation M.A1HS.1 M.A1HS.2 M.A1HS.4a M.A1HS.4b Use units as a way to understand problems and to guide the solution of multi-step problems; choose and interpret units consistently in formulas; choose and interpret

More information

Deceptive Advertising with Rational Buyers

Deceptive Advertising with Rational Buyers Deceptive Advertising with Rational Buyers September 6, 016 ONLINE APPENDIX In this Appendix we present in full additional results and extensions which are only mentioned in the paper. In the exposition

More information

Doing Right By Massive Data: How To Bring Probability Modeling To The Analysis Of Huge Datasets Without Taking Over The Datacenter

Doing Right By Massive Data: How To Bring Probability Modeling To The Analysis Of Huge Datasets Without Taking Over The Datacenter Doing Right By Massive Data: How To Bring Probability Modeling To The Analysis Of Huge Datasets Without Taking Over The Datacenter Alexander W Blocker Pavlos Protopapas Xiao-Li Meng 9 February, 2010 Outline

More information

Transparent Structural Estimation. Matthew Gentzkow Fisher-Schultz Lecture (from work w/ Isaiah Andrews & Jesse M. Shapiro)

Transparent Structural Estimation. Matthew Gentzkow Fisher-Schultz Lecture (from work w/ Isaiah Andrews & Jesse M. Shapiro) Transparent Structural Estimation Matthew Gentzkow Fisher-Schultz Lecture (from work w/ Isaiah Andrews & Jesse M. Shapiro) 1 A hallmark of contemporary applied microeconomics is a conceptual framework

More information

Algebra 1 Mathematics: to Hoover City Schools

Algebra 1 Mathematics: to Hoover City Schools Jump to Scope and Sequence Map Units of Study Correlation of Standards Special Notes Scope and Sequence Map Conceptual Categories, Domains, Content Clusters, & Standard Numbers NUMBER AND QUANTITY (N)

More information

Mathematics Standards for High School Advanced Quantitative Reasoning

Mathematics Standards for High School Advanced Quantitative Reasoning Mathematics Standards for High School Advanced Quantitative Reasoning The Advanced Quantitative Reasoning (AQR) course is a fourth-year launch course designed to be an alternative to Precalculus that prepares

More information

Mathematics Standards for High School Financial Algebra A and Financial Algebra B

Mathematics Standards for High School Financial Algebra A and Financial Algebra B Mathematics Standards for High School Financial Algebra A and Financial Algebra B Financial Algebra A and B are two semester courses that may be taken in either order or one taken without the other; both

More information

THE COMPLETE IDIOT'S GUIDE TO ASTROLOGY (2ND EDITION) BY MADELINE GERWICK-BRODEUR, LISA LENARD

THE COMPLETE IDIOT'S GUIDE TO ASTROLOGY (2ND EDITION) BY MADELINE GERWICK-BRODEUR, LISA LENARD Read Online and Download Ebook THE COMPLETE IDIOT'S GUIDE TO ASTROLOGY (2ND EDITION) BY MADELINE GERWICK-BRODEUR, LISA LENARD DOWNLOAD EBOOK : THE COMPLETE IDIOT'S GUIDE TO ASTROLOGY (2ND Click link bellow

More information

A Correlation of. To the. North Carolina Standard Course of Study for Mathematics High School Math 1

A Correlation of. To the. North Carolina Standard Course of Study for Mathematics High School Math 1 A Correlation of 2018 To the North Carolina Standard Course of Study for Mathematics High School Math 1 Table of Contents Standards for Mathematical Practice... 1 Number and Quantity... 8 Algebra... 9

More information

b14 c04 a15 a03 b02 Edge Decompositions on Graphs

b14 c04 a15 a03 b02 Edge Decompositions on Graphs Robert A. Beeler, Ph.D. Assistant Professor Department of Mathematics and Statistics East Tennessee State University Gilbreath Hall 308A Statement of Research My research has focused on graph theory. Currently,

More information

Glossary. The ISI glossary of statistical terms provides definitions in a number of different languages:

Glossary. The ISI glossary of statistical terms provides definitions in a number of different languages: Glossary The ISI glossary of statistical terms provides definitions in a number of different languages: http://isi.cbs.nl/glossary/index.htm Adjusted r 2 Adjusted R squared measures the proportion of the

More information

MATH 320, WEEK 6: Linear Systems, Gaussian Elimination, Coefficient Matrices

MATH 320, WEEK 6: Linear Systems, Gaussian Elimination, Coefficient Matrices MATH 320, WEEK 6: Linear Systems, Gaussian Elimination, Coefficient Matrices We will now switch gears and focus on a branch of mathematics known as linear algebra. There are a few notes worth making before

More information

Common Core State Standards with California Additions 1 Standards Map. Algebra I

Common Core State Standards with California Additions 1 Standards Map. Algebra I Common Core State s with California Additions 1 s Map Algebra I *Indicates a modeling standard linking mathematics to everyday life, work, and decision-making N-RN 1. N-RN 2. Publisher Language 2 Primary

More information

Causal Inference with Big Data Sets

Causal Inference with Big Data Sets Causal Inference with Big Data Sets Marcelo Coca Perraillon University of Colorado AMC November 2016 1 / 1 Outlone Outline Big data Causal inference in economics and statistics Regression discontinuity

More information

Death Of Soldiers In Iraq During Gulf War II

Death Of Soldiers In Iraq During Gulf War II Death Of Soldiers In Iraq During Gulf War II The field of reliability is concerned with identifying, predicting, and preventing failures. One way to make reliability obvious is to prepare Crow/AMSAA plots

More information

6.207/14.15: Networks Lecture 12: Generalized Random Graphs

6.207/14.15: Networks Lecture 12: Generalized Random Graphs 6.207/14.15: Networks Lecture 12: Generalized Random Graphs 1 Outline Small-world model Growing random networks Power-law degree distributions: Rich-Get-Richer effects Models: Uniform attachment model

More information

Observations Homework Checkpoint quizzes Chapter assessments (Possibly Projects) Blocks of Algebra

Observations Homework Checkpoint quizzes Chapter assessments (Possibly Projects) Blocks of Algebra September The Building Blocks of Algebra Rates, Patterns and Problem Solving Variables and Expressions The Commutative and Associative Properties The Distributive Property Equivalent Expressions Seeing

More information

Regression Analysis in R

Regression Analysis in R Regression Analysis in R 1 Purpose The purpose of this activity is to provide you with an understanding of regression analysis and to both develop and apply that knowledge to the use of the R statistical

More information

Physics 509: Non-Parametric Statistics and Correlation Testing

Physics 509: Non-Parametric Statistics and Correlation Testing Physics 509: Non-Parametric Statistics and Correlation Testing Scott Oser Lecture #19 Physics 509 1 What is non-parametric statistics? Non-parametric statistics is the application of statistical tests

More information

SVAN 2016 Mini Course: Stochastic Convex Optimization Methods in Machine Learning

SVAN 2016 Mini Course: Stochastic Convex Optimization Methods in Machine Learning SVAN 2016 Mini Course: Stochastic Convex Optimization Methods in Machine Learning Mark Schmidt University of British Columbia, May 2016 www.cs.ubc.ca/~schmidtm/svan16 Some images from this lecture are

More information

MATHEMATICS COURSE SYLLABUS

MATHEMATICS COURSE SYLLABUS Course Title: Algebra 1 Honors Department: Mathematics MATHEMATICS COURSE SYLLABUS Primary Course Materials: Big Ideas Math Algebra I Book Authors: Ron Larson & Laurie Boswell Algebra I Student Workbooks

More information

Nonlinear Regression Act4 Exponential Predictions (Statcrunch)

Nonlinear Regression Act4 Exponential Predictions (Statcrunch) Nonlinear Regression Act4 Exponential Predictions (Statcrunch) Directions: Now that we have established the exponential relationships with these variables and analyzed the residuals, let s use the equations

More information

Time Series Analysis. Smoothing Time Series. 2) assessment of/accounting for seasonality. 3) assessment of/exploiting "serial correlation"

Time Series Analysis. Smoothing Time Series. 2) assessment of/accounting for seasonality. 3) assessment of/exploiting serial correlation Time Series Analysis 2) assessment of/accounting for seasonality This (not surprisingly) concerns the analysis of data collected over time... weekly values, monthly values, quarterly values, yearly values,

More information

Econometrics I. Professor William Greene Stern School of Business Department of Economics 1-1/40. Part 1: Introduction

Econometrics I. Professor William Greene Stern School of Business Department of Economics 1-1/40. Part 1: Introduction Econometrics I Professor William Greene Stern School of Business Department of Economics 1-1/40 http://people.stern.nyu.edu/wgreene/econometrics/econometrics.htm 1-2/40 Overview: This is an intermediate

More information

Primary classes of compositions of numbers

Primary classes of compositions of numbers Annales Mathematicae et Informaticae 41 (2013) pp. 193 204 Proceedings of the 15 th International Conference on Fibonacci Numbers and Their Applications Institute of Mathematics and Informatics, Eszterházy

More information

3 Continuous Random Variables

3 Continuous Random Variables Jinguo Lian Math437 Notes January 15, 016 3 Continuous Random Variables Remember that discrete random variables can take only a countable number of possible values. On the other hand, a continuous random

More information

Lecture 2: Linear regression

Lecture 2: Linear regression Lecture 2: Linear regression Roger Grosse 1 Introduction Let s ump right in and look at our first machine learning algorithm, linear regression. In regression, we are interested in predicting a scalar-valued

More information

Article-Count Impact Factor of Materials Science Journals in SCI Database

Article-Count Impact Factor of Materials Science Journals in SCI Database Proof version Article-Count Impact Factor of Materials Science Journals in SCI Database Markpin T 1,*, Boonradsamee B 2,4, Ruksinsut K 4, Yochai W 2, Premkamolnetr N 3, Ratchatahirun P 1 and Sombatsompop

More information

Writing Patent Specifications

Writing Patent Specifications Writing Patent Specifications Japan Patent Office Asia-Pacific Industrial Property Center, JIPII 2013 Collaborator: Shoji HADATE, Patent Attorney, Intellectual Property Office NEXPAT CONTENTS Page 1. Patent

More information

Undergraduate Notes in Mathematics. Arkansas Tech University Department of Mathematics. College Algebra for STEM

Undergraduate Notes in Mathematics. Arkansas Tech University Department of Mathematics. College Algebra for STEM Undergraduate Notes in Mathematics Arkansas Tech University Department of Mathematics College Algebra for STEM Marcel B. Finan c All Rights Reserved 2015 Edition To my children Amin & Nadia Preface From

More information

Two years of high school algebra and ACT math score of at least 19; or DSPM0850 or equivalent math placement score.

Two years of high school algebra and ACT math score of at least 19; or DSPM0850 or equivalent math placement score. PELLISSIPPI STATE TECHNICAL COMMUNITY COLLEGE MASTER SYLLABUS INTERMEDIATE AND COLLEGE ALGEBRA DSPM0850 / MATH 1130 Class Hours: 3.0 Credit Hours: 6.0 Laboratory Hours: 0.0 Revised: Fall 05 Catalog Course

More information

Impact of Data Characteristics on Recommender Systems Performance

Impact of Data Characteristics on Recommender Systems Performance Impact of Data Characteristics on Recommender Systems Performance Gediminas Adomavicius YoungOk Kwon Jingjing Zhang Department of Information and Decision Sciences Carlson School of Management, University

More information

W G. Centre for R&D Monitoring and Dept. MSI, KU Leuven, Belgium

W G. Centre for R&D Monitoring and Dept. MSI, KU Leuven, Belgium W G Centre for R&D Monitoring and Dept. MSI, KU Leuven, Belgium Structure of presentation 1. 2. 2.1 Characteristic Scores and Scales (CSS) 3. 3.1 CSS at the macro level 3.2 CSS in all fields combined 3.3

More information

Get started [Hawkes Learning] with this system. Common final exam, independently administered, group graded, grades reported.

Get started [Hawkes Learning] with this system. Common final exam, independently administered, group graded, grades reported. Course Information Math 095 Elementary Algebra Placement No placement necessary Course Description Learning Outcomes Elementary algebraic topics for students whose mathematical background or placement

More information

ARNOLD S ELEMENTARY PROOF OF THE INSOLVABILITY OF THE QUINTIC

ARNOLD S ELEMENTARY PROOF OF THE INSOLVABILITY OF THE QUINTIC ARNOLD S ELEMENTARY PROOF OF THE INSOLVABILITY OF THE QUINTIC LEO GOLDMAKHER ABSTRACT. We give a proof (due to Arnold) that there is no quintic formula. Somewhat more precisely, we show that any finite

More information

M E R C E R W I N WA L K T H R O U G H

M E R C E R W I N WA L K T H R O U G H H E A L T H W E A L T H C A R E E R WA L K T H R O U G H C L I E N T S O L U T I O N S T E A M T A B L E O F C O N T E N T 1. Login to the Tool 2 2. Published reports... 7 3. Select Results Criteria...

More information

Communication Engineering Prof. Surendra Prasad Department of Electrical Engineering Indian Institute of Technology, Delhi

Communication Engineering Prof. Surendra Prasad Department of Electrical Engineering Indian Institute of Technology, Delhi Communication Engineering Prof. Surendra Prasad Department of Electrical Engineering Indian Institute of Technology, Delhi Lecture - 41 Pulse Code Modulation (PCM) So, if you remember we have been talking

More information

CHAPTER 4: DATASETS AND CRITERIA FOR ALGORITHM EVALUATION

CHAPTER 4: DATASETS AND CRITERIA FOR ALGORITHM EVALUATION CHAPTER 4: DATASETS AND CRITERIA FOR ALGORITHM EVALUATION 4.1 Overview This chapter contains the description about the data that is used in this research. In this research time series data is used. A time

More information

Lecture Prepared By: Mohammad Kamrul Arefin Lecturer, School of Business, North South University

Lecture Prepared By: Mohammad Kamrul Arefin Lecturer, School of Business, North South University Lecture 15 20 Prepared By: Mohammad Kamrul Arefin Lecturer, School of Business, North South University Modeling for Time Series Forecasting Forecasting is a necessary input to planning, whether in business,

More information

Discrete Mathematics and Probability Theory Spring 2016 Rao and Walrand Note 1

Discrete Mathematics and Probability Theory Spring 2016 Rao and Walrand Note 1 CS 70 Discrete Mathematics and Probability Theory Spring 2016 Rao and Walrand Note 1 1 A Brief Introduction Welcome to Discrete Math and Probability Theory! You might be wondering what you ve gotten yourself

More information